Module: Polars::LazyFunctions

Included in:
Polars
Defined in:
lib/polars/lazy_functions.rb

Instance Method Summary collapse

Instance Method Details

#all(name = nil) ⇒ Expr

Do one of two things.

  • function can do a columnwise or elementwise AND operation
  • a wildcard column selection

Examples:

Sum all columns

df = Polars::DataFrame.new(
  {"a" => [1, 2, 3], "b" => ["hello", "foo", "bar"], "c" => [1, 1, 1]}
)
df.select(Polars.all.sum)
# =>
# shape: (1, 3)
# ┌─────┬──────┬─────┐
# │ a   ┆ b    ┆ c   │
# │ --- ┆ ---  ┆ --- │
# │ i64 ┆ str  ┆ i64 │
# ╞═════╪══════╪═════╡
# │ 6   ┆ null ┆ 3   │
# └─────┴──────┴─────┘

Parameters:

  • name (Object) (defaults to: nil)

    If given this function will apply a bitwise & on the columns.

Returns:



576
577
578
579
580
581
582
583
584
# File 'lib/polars/lazy_functions.rb', line 576

def all(name = nil)
  if name.nil?
    col("*")
  elsif Utils.strlike?(name)
    col(name).all
  else
    raise Todo
  end
end

#any(name) ⇒ Expr

Evaluate columnwise or elementwise with a bitwise OR operation.

Returns:



481
482
483
484
485
486
487
# File 'lib/polars/lazy_functions.rb', line 481

def any(name)
  if Utils.strlike?(name)
    col(name).any
  else
    fold(lit(false), ->(a, b) { a.cast(:bool) | b.cast(:bool) }, name).alias("any")
  end
end

#arg_sort_by(exprs, reverse: false) ⇒ Expr Also known as: argsort_by

Find the indexes that would sort the columns.

Argsort by multiple columns. The first column will be used for the ordering. If there are duplicates in the first column, the second column will be used to determine the ordering and so on.

Parameters:

  • exprs (Object)

    Columns use to determine the ordering.

  • reverse (Boolean) (defaults to: false)

    Default is ascending.

Returns:



662
663
664
665
666
667
668
669
670
671
# File 'lib/polars/lazy_functions.rb', line 662

def arg_sort_by(exprs, reverse: false)
  if !exprs.is_a?(::Array)
    exprs = [exprs]
  end
  if reverse == true || reverse == false
    reverse = [reverse] * exprs.length
  end
  exprs = Utils.selection_to_rbexpr_list(exprs)
  Utils.wrap_expr(RbExpr.arg_sort_by(exprs, reverse))
end

#arg_where(condition, eager: false) ⇒ Expr, Series

Return indices where condition evaluates true.

Examples:

df = Polars::DataFrame.new({"a" => [1, 2, 3, 4, 5]})
df.select(
  [
    Polars.arg_where(Polars.col("a") % 2 == 0)
  ]
).to_series
# =>
# shape: (2,)
# Series: 'a' [u32]
# [
#         1
#         3
# ]

Parameters:

  • condition (Expr)

    Boolean expression to evaluate

  • eager (Boolean) (defaults to: false)

    Whether to apply this function eagerly (as opposed to lazily).

Returns:



1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
# File 'lib/polars/lazy_functions.rb', line 1048

def arg_where(condition, eager: false)
  if eager
    if !condition.is_a?(Series)
      raise ArgumentError, "expected 'Series' in 'arg_where' if 'eager=True', got #{condition.class.name}"
    end
    condition.to_frame.select(arg_where(Polars.col(condition.name))).to_series
  else
    condition = Utils.expr_to_lit_or_expr(condition, str_to_lit: true)
    Utils.wrap_expr(_arg_where(condition._rbexpr))
  end
end

#avg(column) ⇒ Expr, Float

Get the mean value.

Returns:



165
166
167
# File 'lib/polars/lazy_functions.rb', line 165

def avg(column)
  mean(column)
end

#coalesce(exprs, *more_exprs) ⇒ Expr

Folds the expressions from left to right, keeping the first non-null value.

Examples:

df = Polars::DataFrame.new(
  [
    [nil, 1.0, 1.0],
    [nil, 2.0, 2.0],
    [nil, nil, 3.0],
    [nil, nil, nil]
  ],
  columns: [["a", :f64], ["b", :f64], ["c", :f64]]
)
df.with_column(Polars.coalesce(["a", "b", "c", 99.9]).alias("d"))
# =>
# shape: (4, 4)
# ┌──────┬──────┬──────┬──────┐
# │ a    ┆ b    ┆ c    ┆ d    │
# │ ---  ┆ ---  ┆ ---  ┆ ---  │
# │ f64  ┆ f64  ┆ f64  ┆ f64  │
# ╞══════╪══════╪══════╪══════╡
# │ null ┆ 1.0  ┆ 1.0  ┆ 1.0  │
# │ null ┆ 2.0  ┆ 2.0  ┆ 2.0  │
# │ null ┆ null ┆ 3.0  ┆ 3.0  │
# │ null ┆ null ┆ null ┆ 99.9 │
# └──────┴──────┴──────┴──────┘

Parameters:

  • exprs (Object)

    Expressions to coalesce.

Returns:



1090
1091
1092
1093
1094
1095
1096
# File 'lib/polars/lazy_functions.rb', line 1090

def coalesce(exprs, *more_exprs)
  exprs = Utils.selection_to_rbexpr_list(exprs)
  if more_exprs.any?
    exprs.concat(Utils.selection_to_rbexpr_list(more_exprs))
  end
  Utils.wrap_expr(_coalesce_exprs(exprs))
end

#col(name) ⇒ Expr

Return an expression representing a column in a DataFrame.

Returns:



6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# File 'lib/polars/lazy_functions.rb', line 6

def col(name)
  if name.is_a?(Series)
    name = name.to_a
  end

  if name.is_a?(Class) && name < DataType
    name = [name]
  end

  if name.is_a?(DataType)
    Utils.wrap_expr(_dtype_cols([name]))
  elsif name.is_a?(::Array)
    if name.length == 0 || Utils.strlike?(name[0])
      name = name.map { |v| v.is_a?(Symbol) ? v.to_s : v }
      Utils.wrap_expr(RbExpr.cols(name))
    elsif Utils.is_polars_dtype(name[0])
      Utils.wrap_expr(_dtype_cols(name))
    else
      raise ArgumentError, "Expected list values to be all `str` or all `DataType`"
    end
  else
    name = name.to_s if name.is_a?(Symbol)
    Utils.wrap_expr(RbExpr.col(name))
  end
end

#collect_all(lazy_frames, type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ Array

Collect multiple LazyFrames at the same time.

This runs all the computation graphs in parallel on Polars threadpool.

Parameters:

  • lazy_frames (Boolean)

    A list of LazyFrames to collect.

  • type_coercion (Boolean) (defaults to: true)

    Do type coercion optimization.

  • predicate_pushdown (Boolean) (defaults to: true)

    Do predicate pushdown optimization.

  • projection_pushdown (Boolean) (defaults to: true)

    Do projection pushdown optimization.

  • simplify_expression (Boolean) (defaults to: true)

    Run simplify expressions optimization.

  • string_cache (Boolean) (defaults to: false)

    This argument is deprecated and will be ignored

  • no_optimization (Boolean) (defaults to: false)

    Turn off optimizations.

  • slice_pushdown (Boolean) (defaults to: true)

    Slice pushdown optimization.

  • common_subplan_elimination (Boolean) (defaults to: true)

    Will try to cache branching subplans that occur on self-joins or unions.

  • allow_streaming (Boolean) (defaults to: false)

    Run parts of the query in a streaming fashion (this is in an alpha state)

Returns:



889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
# File 'lib/polars/lazy_functions.rb', line 889

def collect_all(
  lazy_frames,
  type_coercion: true,
  predicate_pushdown: true,
  projection_pushdown: true,
  simplify_expression: true,
  string_cache: false,
  no_optimization: false,
  slice_pushdown: true,
  common_subplan_elimination: true,
  allow_streaming: false
)
  if no_optimization
    predicate_pushdown = false
    projection_pushdown = false
    slice_pushdown = false
    common_subplan_elimination = false
  end

  prepared = []

  lazy_frames.each do |lf|
    ldf = lf._ldf.optimization_toggle(
      type_coercion,
      predicate_pushdown,
      projection_pushdown,
      simplify_expression,
      slice_pushdown,
      common_subplan_elimination,
      allow_streaming,
      false
    )
    prepared << ldf
  end

  out = _collect_all(prepared)

  # wrap the rbdataframes into dataframe
  result = out.map { |rbdf| Utils.wrap_df(rbdf) }

  result
end

#concat_list(exprs) ⇒ Expr

Concat the arrays in a Series dtype List in linear time.

Returns:



858
859
860
861
# File 'lib/polars/lazy_functions.rb', line 858

def concat_list(exprs)
  exprs = Utils.selection_to_rbexpr_list(exprs)
  Utils.wrap_expr(RbExpr.concat_lst(exprs))
end

#concat_str(exprs, sep: "") ⇒ Expr

Horizontally concat Utf8 Series in linear time. Non-Utf8 columns are cast to Utf8.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [1, 2, 3],
    "b" => ["dogs", "cats", nil],
    "c" => ["play", "swim", "walk"]
  }
)
df.with_columns(
  [
    Polars.concat_str(
      [
        Polars.col("a") * 2,
        Polars.col("b"),
        Polars.col("c")
      ],
      sep: " "
    ).alias("full_sentence")
  ]
)
# =>
# shape: (3, 4)
# ┌─────┬──────┬──────┬───────────────┐
# │ a   ┆ b    ┆ c    ┆ full_sentence │
# │ --- ┆ ---  ┆ ---  ┆ ---           │
# │ i64 ┆ str  ┆ str  ┆ str           │
# ╞═════╪══════╪══════╪═══════════════╡
# │ 1   ┆ dogs ┆ play ┆ 2 dogs play   │
# │ 2   ┆ cats ┆ swim ┆ 4 cats swim   │
# │ 3   ┆ null ┆ walk ┆ null          │
# └─────┴──────┴──────┴───────────────┘

Parameters:

  • exprs (Object)

    Columns to concat into a Utf8 Series.

  • sep (String) (defaults to: "")

    String value that will be used to separate the values.

Returns:



797
798
799
800
# File 'lib/polars/lazy_functions.rb', line 797

def concat_str(exprs, sep: "")
  exprs = Utils.selection_to_rbexpr_list(exprs)
  return Utils.wrap_expr(RbExpr.concat_str(exprs, sep))
end

#count(column = nil) ⇒ Expr, Integer

Count the number of values in this column/context.

Parameters:

  • column (String, Series, nil) (defaults to: nil)

    If dtype is:

    • Series : count the values in the series.
    • String : count the values in this column.
    • None : count the number of values in this context.

Returns:



66
67
68
69
70
71
72
73
74
75
76
# File 'lib/polars/lazy_functions.rb', line 66

def count(column = nil)
  if column.nil?
    return Utils.wrap_expr(RbExpr.count)
  end

  if column.is_a?(Series)
    column.len
  else
    col(column).count
  end
end

#cov(a, b) ⇒ Expr

Compute the covariance between two columns/ expressions.

Parameters:

  • a (Object)

    Column name or Expression.

  • b (Object)

    Column name or Expression.

Returns:



413
414
415
416
417
418
419
420
421
# File 'lib/polars/lazy_functions.rb', line 413

def cov(a, b)
  if Utils.strlike?(a)
    a = col(a)
  end
  if Utils.strlike?(b)
    b = col(b)
  end
  Utils.wrap_expr(RbExpr.cov(a._rbexpr, b._rbexpr))
end

#cumfold(acc, f, exprs, include_init: false) ⇒ Object

Note:

If you simply want the first encountered expression as accumulator, consider using cumreduce.

Cumulatively accumulate over multiple columns horizontally/row wise with a left fold.

Every cumulative result is added as a separate field in a Struct column.

Parameters:

  • acc (Object)

    Accumulator Expression. This is the value that will be initialized when the fold starts. For a sum this could for instance be lit(0).

  • f (Object)

    Function to apply over the accumulator and the value. Fn(acc, value) -> new_value

  • exprs (Object)

    Expressions to aggregate over. May also be a wildcard expression.

  • include_init (Boolean) (defaults to: false)

    Include the initial accumulator state as struct field.

Returns:



465
466
467
468
469
470
471
472
473
# File 'lib/polars/lazy_functions.rb', line 465

def cumfold(acc, f, exprs, include_init: false)
  acc = Utils.expr_to_lit_or_expr(acc, str_to_lit: true)
  if exprs.is_a?(Expr)
    exprs = [exprs]
  end

  exprs = Utils.selection_to_rbexpr_list(exprs)
  Utils.wrap_expr(RbExpr.cumfold(acc._rbexpr, f, exprs, include_init))
end

#cumsum(column) ⇒ Object

Cumulatively sum values in a column/Series, or horizontally across list of columns/expressions.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [1, 2],
    "b" => [3, 4],
    "c" => [5, 6]
  }
)
# =>
# shape: (2, 3)
# ┌─────┬─────┬─────┐
# │ a   ┆ b   ┆ c   │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞═════╪═════╪═════╡
# │ 1   ┆ 3   ┆ 5   │
# │ 2   ┆ 4   ┆ 6   │
# └─────┴─────┴─────┘

Cumulatively sum a column by name:

df.select(Polars.cumsum("a"))
# =>
# shape: (2, 1)
# ┌─────┐
# │ a   │
# │ --- │
# │ i64 │
# ╞═════╡
# │ 1   │
# │ 3   │
# └─────┘

Cumulatively sum a list of columns/expressions horizontally:

df.with_column(Polars.cumsum(["a", "c"]))
# =>
# shape: (2, 4)
# ┌─────┬─────┬─────┬───────────┐
# │ a   ┆ b   ┆ c   ┆ cumsum    │
# │ --- ┆ --- ┆ --- ┆ ---       │
# │ i64 ┆ i64 ┆ i64 ┆ struct[2] │
# ╞═════╪═════╪═════╪═══════════╡
# │ 1   ┆ 3   ┆ 5   ┆ {1,6}     │
# │ 2   ┆ 4   ┆ 6   ┆ {2,8}     │
# └─────┴─────┴─────┴───────────┘

Parameters:

  • column (Object)

    Column(s) to be used in aggregation.

Returns:



349
350
351
352
353
354
355
356
357
# File 'lib/polars/lazy_functions.rb', line 349

def cumsum(column)
  if column.is_a?(Series)
    column.cumsum
  elsif Utils.strlike?(column)
    col(column).cumsum
  else
    cumfold(lit(0).cast(:u32), ->(a, b) { a + b }, column).alias("cumsum")
  end
end

#duration(weeks: nil, days: nil, hours: nil, minutes: nil, seconds: nil, milliseconds: nil, microseconds: nil, nanoseconds: nil, time_unit: "us") ⇒ Expr

Create polars Duration from distinct time components.

Examples:

df = Polars::DataFrame.new(
  {
    "datetime" => [DateTime.new(2022, 1, 1), DateTime.new(2022, 1, 2)],
    "add" => [1, 2]
  }
)
df.select(
  [
    (Polars.col("datetime") + Polars.duration(weeks: "add")).alias("add_weeks"),
    (Polars.col("datetime") + Polars.duration(days: "add")).alias("add_days"),
    (Polars.col("datetime") + Polars.duration(seconds: "add")).alias("add_seconds"),
    (Polars.col("datetime") + Polars.duration(milliseconds: "add")).alias(
      "add_milliseconds"
    ),
    (Polars.col("datetime") + Polars.duration(hours: "add")).alias("add_hours")
  ]
)
# =>
# shape: (2, 5)
# ┌─────────────────────┬─────────────────────┬─────────────────────┬─────────────────────────┬─────────────────────┐
# │ add_weeks           ┆ add_days            ┆ add_seconds         ┆ add_milliseconds        ┆ add_hours           │
# │ ---                 ┆ ---                 ┆ ---                 ┆ ---                     ┆ ---                 │
# │ datetime[ns]        ┆ datetime[ns]        ┆ datetime[ns]        ┆ datetime[ns]            ┆ datetime[ns]        │
# ╞═════════════════════╪═════════════════════╪═════════════════════╪═════════════════════════╪═════════════════════╡
# │ 2022-01-08 00:00:00 ┆ 2022-01-02 00:00:00 ┆ 2022-01-01 00:00:01 ┆ 2022-01-01 00:00:00.001 ┆ 2022-01-01 01:00:00 │
# │ 2022-01-16 00:00:00 ┆ 2022-01-04 00:00:00 ┆ 2022-01-02 00:00:02 ┆ 2022-01-02 00:00:00.002 ┆ 2022-01-02 02:00:00 │
# └─────────────────────┴─────────────────────┴─────────────────────┴─────────────────────────┴─────────────────────┘

Returns:



706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
# File 'lib/polars/lazy_functions.rb', line 706

def duration(
  weeks: nil,
  days: nil,
  hours: nil,
  minutes: nil,
  seconds: nil,
  milliseconds: nil,
  microseconds: nil,
  nanoseconds: nil,
  time_unit: "us"
)
  if !weeks.nil?
    weeks = Utils.expr_to_lit_or_expr(weeks, str_to_lit: false)._rbexpr
  end
  if !days.nil?
    days = Utils.expr_to_lit_or_expr(days, str_to_lit: false)._rbexpr
  end
  if !hours.nil?
    hours = Utils.expr_to_lit_or_expr(hours, str_to_lit: false)._rbexpr
  end
  if !minutes.nil?
    minutes = Utils.expr_to_lit_or_expr(minutes, str_to_lit: false)._rbexpr
  end
  if !seconds.nil?
    seconds = Utils.expr_to_lit_or_expr(seconds, str_to_lit: false)._rbexpr
  end
  if !milliseconds.nil?
    milliseconds = Utils.expr_to_lit_or_expr(milliseconds, str_to_lit: false)._rbexpr
  end
  if !microseconds.nil?
    microseconds = Utils.expr_to_lit_or_expr(microseconds, str_to_lit: false)._rbexpr
  end
  if !nanoseconds.nil?
    nanoseconds = Utils.expr_to_lit_or_expr(nanoseconds, str_to_lit: false)._rbexpr
  end

  Utils.wrap_expr(
    _rb_duration(
      weeks,
      days,
      hours,
      minutes,
      seconds,
      milliseconds,
      microseconds,
      nanoseconds,
      time_unit
    )
  )
end

#elementExpr

Alias for an element in evaluated in an eval expression.

Examples:

A horizontal rank computation by taking the elements of a list

df = Polars::DataFrame.new({"a" => [1, 8, 3], "b" => [4, 5, 2]})
df.with_column(
  Polars.concat_list(["a", "b"]).list.eval(Polars.element.rank).alias("rank")
)
# =>
# shape: (3, 3)
# ┌─────┬─────┬────────────┐
# │ a   ┆ b   ┆ rank       │
# │ --- ┆ --- ┆ ---        │
# │ i64 ┆ i64 ┆ list[f64]  │
# ╞═════╪═════╪════════════╡
# │ 1   ┆ 4   ┆ [1.0, 2.0] │
# │ 8   ┆ 5   ┆ [2.0, 1.0] │
# │ 3   ┆ 2   ┆ [2.0, 1.0] │
# └─────┴─────┴────────────┘

Returns:



52
53
54
# File 'lib/polars/lazy_functions.rb', line 52

def element
  col("")
end

#exclude(columns) ⇒ Object

Exclude certain columns from a wildcard/regex selection.

Examples:

df = Polars::DataFrame.new(
  {
    "aa" => [1, 2, 3],
    "ba" => ["a", "b", nil],
    "cc" => [nil, 2.5, 1.5]
  }
)
# =>
# shape: (3, 3)
# ┌─────┬──────┬──────┐
# │ aa  ┆ ba   ┆ cc   │
# │ --- ┆ ---  ┆ ---  │
# │ i64 ┆ str  ┆ f64  │
# ╞═════╪══════╪══════╡
# │ 1   ┆ a    ┆ null │
# │ 2   ┆ b    ┆ 2.5  │
# │ 3   ┆ null ┆ 1.5  │
# └─────┴──────┴──────┘

Exclude by column name(s):

df.select(Polars.exclude("ba"))
# =>
# shape: (3, 2)
# ┌─────┬──────┐
# │ aa  ┆ cc   │
# │ --- ┆ ---  │
# │ i64 ┆ f64  │
# ╞═════╪══════╡
# │ 1   ┆ null │
# │ 2   ┆ 2.5  │
# │ 3   ┆ 1.5  │
# └─────┴──────┘

Exclude by regex, e.g. removing all columns whose names end with the letter "a":

df.select(Polars.exclude("^.*a$"))
# =>
# shape: (3, 1)
# ┌──────┐
# │ cc   │
# │ ---  │
# │ f64  │
# ╞══════╡
# │ null │
# │ 2.5  │
# │ 1.5  │
# └──────┘

Parameters:

  • columns (Object)

    Column(s) to exclude from selection This can be:

    • a column name, or multiple column names
    • a regular expression starting with ^ and ending with $
    • a dtype or multiple dtypes

Returns:



548
549
550
# File 'lib/polars/lazy_functions.rb', line 548

def exclude(columns)
  col("*").exclude(columns)
end

#first(column = nil) ⇒ Object

Get the first value.

Returns:



194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
# File 'lib/polars/lazy_functions.rb', line 194

def first(column = nil)
  if column.nil?
    return Utils.wrap_expr(RbExpr.first)
  end

  if column.is_a?(Series)
    if column.len > 0
      column[0]
    else
      raise IndexError, "The series is empty, so no first value can be returned."
    end
  else
    col(column).first
  end
end

#fold(acc, f, exprs) ⇒ Expr

Accumulate over multiple columns horizontally/row wise with a left fold.

Returns:



432
433
434
435
436
437
438
439
440
# File 'lib/polars/lazy_functions.rb', line 432

def fold(acc, f, exprs)
  acc = Utils.expr_to_lit_or_expr(acc, str_to_lit: true)
  if exprs.is_a?(Expr)
    exprs = [exprs]
  end

  exprs = Utils.selection_to_rbexpr_list(exprs)
  Utils.wrap_expr(RbExpr.fold(acc._rbexpr, f, exprs))
end

#format(fstring, *args) ⇒ Expr

Format expressions as a string.

Examples:

df = Polars::DataFrame.new(
  {
    "a": ["a", "b", "c"],
    "b": [1, 2, 3]
  }
)
df.select(
  [
    Polars.format("foo_{}_bar_{}", Polars.col("a"), "b").alias("fmt")
  ]
)
# =>
# shape: (3, 1)
# ┌─────────────┐
# │ fmt         │
# │ ---         │
# │ str         │
# ╞═════════════╡
# │ foo_a_bar_1 │
# │ foo_b_bar_2 │
# │ foo_c_bar_3 │
# └─────────────┘

Parameters:

  • fstring (String)

    A string that with placeholders. For example: "hello_{}" or "{}_world

  • args (Object)

    Expression(s) that fill the placeholders

Returns:



835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
# File 'lib/polars/lazy_functions.rb', line 835

def format(fstring, *args)
  if fstring.scan("{}").length != args.length
    raise ArgumentError, "number of placeholders should equal the number of arguments"
  end

  exprs = []

  arguments = args.each
  fstring.split(/(\{\})/).each do |s|
    if s == "{}"
      e = Utils.expr_to_lit_or_expr(arguments.next, str_to_lit: false)
      exprs << e
    elsif s.length > 0
      exprs << lit(s)
    end
  end

  concat_str(exprs, sep: "")
end

#from_epoch(column, unit: "s", eager: false) ⇒ Object

Utility function that parses an epoch timestamp (or Unix time) to Polars Date(time).

Depending on the unit provided, this function will return a different dtype:

  • unit: "d" returns pl.Date
  • unit: "s" returns pl.Datetime"us"
  • unit: "ms" returns pl.Datetime["ms"]
  • unit: "us" returns pl.Datetime["us"]
  • unit: "ns" returns pl.Datetime["ns"]

Examples:

df = Polars::DataFrame.new({"timestamp" => [1666683077, 1666683099]}).lazy
df.select(Polars.from_epoch(Polars.col("timestamp"), unit: "s")).collect
# =>
# shape: (2, 1)
# ┌─────────────────────┐
# │ timestamp           │
# │ ---                 │
# │ datetime[μs]        │
# ╞═════════════════════╡
# │ 2022-10-25 07:31:17 │
# │ 2022-10-25 07:31:39 │
# └─────────────────────┘

Parameters:

  • column (Object)

    Series or expression to parse integers to pl.Datetime.

  • unit (String) (defaults to: "s")

    The unit of the timesteps since epoch time.

  • eager (Boolean) (defaults to: false)

    If eager evaluation is true, a Series is returned instead of an Expr.

Returns:



1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
# File 'lib/polars/lazy_functions.rb', line 1129

def from_epoch(column, unit: "s", eager: false)
  if Utils.strlike?(column)
    column = col(column)
  elsif !column.is_a?(Series) && !column.is_a?(Expr)
    column = Series.new(column)
  end

  if unit == "d"
    expr = column.cast(Date)
  elsif unit == "s"
    expr = (column.cast(Int64) * 1_000_000).cast(Datetime.new("us"))
  elsif Utils::DTYPE_TEMPORAL_UNITS.include?(unit)
    expr = column.cast(Datetime.new(unit))
  else
    raise ArgumentError, "'unit' must be one of {{'ns', 'us', 'ms', 's', 'd'}}, got '#{unit}'."
  end

  if eager
    if !column.is_a?(Series)
      raise ArgumentError, "expected Series or Array if eager: true, got #{column.class.name}"
    else
      column.to_frame.select(expr).to_series
    end
  else
    expr
  end
end

#groups(column) ⇒ Object

Syntactic sugar for Polars.col("foo").agg_groups.

Returns:



589
590
591
# File 'lib/polars/lazy_functions.rb', line 589

def groups(column)
  col(column).agg_groups
end

#head(column, n = 10) ⇒ Object

Get the first n rows.

Parameters:

  • column (Object)

    Column name or Series.

  • n (Integer) (defaults to: 10)

    Number of rows to return.

Returns:



242
243
244
245
246
247
248
# File 'lib/polars/lazy_functions.rb', line 242

def head(column, n = 10)
  if column.is_a?(Series)
    column.head(n)
  else
    col(column).head(n)
  end
end

#int_range(start, stop, step: 1, eager: false, dtype: nil) ⇒ Expr, Series Also known as: arange

Create a range expression (or Series).

This can be used in a select, with_column, etc. Be sure that the resulting range size is equal to the length of the DataFrame you are collecting.

Examples:

Polars.arange(0, 3, eager: true)
# =>
# shape: (3,)
# Series: 'arange' [i64]
# [
#         0
#         1
#         2
# ]

Parameters:

  • start (Integer, Expr, Series)

    Lower bound of range.

  • stop (Integer, Expr, Series)

    Upper bound of range.

  • step (Integer) (defaults to: 1)

    Step size of the range.

  • eager (Boolean) (defaults to: false)

    If eager evaluation is True, a Series is returned instead of an Expr.

  • dtype (Symbol) (defaults to: nil)

    Apply an explicit integer dtype to the resulting expression (default is Int64).

Returns:



635
636
637
638
639
640
641
642
643
644
645
646
647
# File 'lib/polars/lazy_functions.rb', line 635

def int_range(start, stop, step: 1, eager: false, dtype: nil)
  start = Utils.parse_as_expression(start)
  stop = Utils.parse_as_expression(stop)
  dtype ||= Int64
  dtype = dtype.to_s if dtype.is_a?(Symbol)
  result = Utils.wrap_expr(RbExpr.int_range(start, stop, step, dtype)).alias("arange")

  if eager
    return select(result).to_series
  end

  result
end

#last(column = nil) ⇒ Object

Get the last value.

Depending on the input type this function does different things:

  • nil -> expression to take last column of a context.
  • String -> syntactic sugar for Polars.col(..).last
  • Series -> Take last value in Series

Returns:



219
220
221
222
223
224
225
226
227
228
229
230
231
232
# File 'lib/polars/lazy_functions.rb', line 219

def last(column = nil)
  if column.nil?
    return Utils.wrap_expr(_last)
  end

  if column.is_a?(Series)
    if column.len > 0
      return column[-1]
    else
      raise IndexError, "The series is empty, so no last value can be returned"
    end
  end
  col(column).last
end

#lit(value, dtype: nil, allow_object: nil) ⇒ Expr

Return an expression representing a literal value.

Returns:



269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
# File 'lib/polars/lazy_functions.rb', line 269

def lit(value, dtype: nil, allow_object: nil)
  if value.is_a?(::Time) || value.is_a?(::DateTime)
    time_unit = dtype&.time_unit || "ns"
    time_zone = dtype.&time_zone
    e = lit(Utils._datetime_to_pl_timestamp(value, time_unit)).cast(Datetime.new(time_unit))
    if time_zone
      return e.dt.replace_time_zone(time_zone.to_s)
    else
      return e
    end
  elsif value.is_a?(::Date)
    return lit(::Time.utc(value.year, value.month, value.day)).cast(Date)
  elsif value.is_a?(Polars::Series)
    name = value.name
    value = value._s
    e = Utils.wrap_expr(RbExpr.lit(value, allow_object))
    if name == ""
      return e
    end
    return e.alias(name)
  elsif (defined?(Numo::NArray) && value.is_a?(Numo::NArray)) || value.is_a?(::Array)
    return lit(Series.new("", value))
  elsif dtype
    return Utils.wrap_expr(RbExpr.lit(value, allow_object)).cast(dtype)
  end

  Utils.wrap_expr(RbExpr.lit(value, allow_object))
end

#max(column) ⇒ Expr, Object

Get the maximum value.

Parameters:

  • column (Object)

    Column(s) to be used in aggregation.

Returns:



113
114
115
116
117
118
119
# File 'lib/polars/lazy_functions.rb', line 113

def max(column)
  if column.is_a?(Series)
    column.max
  else
    col(column).max
  end
end

#mean(column) ⇒ Expr, Float

Get the mean value.

Returns:



154
155
156
157
158
159
160
# File 'lib/polars/lazy_functions.rb', line 154

def mean(column)
  if column.is_a?(Series)
    column.mean
  else
    col(column).mean
  end
end

#median(column) ⇒ Object

Get the median value.

Returns:



172
173
174
175
176
177
178
# File 'lib/polars/lazy_functions.rb', line 172

def median(column)
  if column.is_a?(Series)
    column.median
  else
    col(column).median
  end
end

#min(column) ⇒ Expr, Object

Get the minimum value.

Parameters:

  • column (Object)

    Column(s) to be used in aggregation.

Returns:



127
128
129
130
131
132
133
# File 'lib/polars/lazy_functions.rb', line 127

def min(column)
  if column.is_a?(Series)
    column.min
  else
    col(column).min
  end
end

#n_unique(column) ⇒ Object

Count unique values.

Returns:



183
184
185
186
187
188
189
# File 'lib/polars/lazy_functions.rb', line 183

def n_unique(column)
  if column.is_a?(Series)
    column.n_unique
  else
    col(column).n_unique
  end
end

#pearson_corr(a, b, ddof: 1) ⇒ Expr

Compute the pearson's correlation between two columns.

Parameters:

  • a (Object)

    Column name or Expression.

  • b (Object)

    Column name or Expression.

  • ddof (Integer) (defaults to: 1)

    Delta degrees of freedom

Returns:



395
396
397
398
399
400
401
402
403
# File 'lib/polars/lazy_functions.rb', line 395

def pearson_corr(a, b, ddof: 1)
  if Utils.strlike?(a)
    a = col(a)
  end
  if Utils.strlike?(b)
    b = col(b)
  end
  Utils.wrap_expr(RbExpr.pearson_corr(a._rbexpr, b._rbexpr, ddof))
end

#quantile(column, quantile, interpolation: "nearest") ⇒ Expr

Syntactic sugar for Polars.col("foo").quantile(...).

Parameters:

  • column (String)

    Column name.

  • quantile (Float)

    Quantile between 0.0 and 1.0.

  • interpolation ("nearest", "higher", "lower", "midpoint", "linear") (defaults to: "nearest")

    Interpolation method.

Returns:



603
604
605
# File 'lib/polars/lazy_functions.rb', line 603

def quantile(column, quantile, interpolation: "nearest")
  col(column).quantile(quantile, interpolation: interpolation)
end

#repeat(value, n, dtype: nil, eager: false, name: nil) ⇒ Expr

Repeat a single value n times.

Parameters:

  • value (Object)

    Value to repeat.

  • n (Integer)

    Repeat n times.

  • eager (Boolean) (defaults to: false)

    Run eagerly and collect into a Series.

  • name (String) (defaults to: nil)

    Only used in eager mode. As expression, use alias.

Returns:



1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
# File 'lib/polars/lazy_functions.rb', line 1005

def repeat(value, n, dtype: nil, eager: false, name: nil)
  if !name.nil?
    warn "the `name` argument is deprecated. Use the `alias` method instead."
  end

  if n.is_a?(Integer)
    n = lit(n)
  end

  value = Utils.parse_as_expression(value, str_as_lit: true)
  expr = Utils.wrap_expr(RbExpr.repeat(value, n._rbexpr, dtype))
  if !name.nil?
    expr = expr.alias(name)
  end
  if eager
    return select(expr).to_series
  end
  expr
end

#select(exprs) ⇒ DataFrame

Run polars expressions without a context.

Returns:



935
936
937
# File 'lib/polars/lazy_functions.rb', line 935

def select(exprs)
  DataFrame.new([]).select(exprs)
end

#spearman_rank_corr(a, b, ddof: 1, propagate_nans: false) ⇒ Expr

Compute the spearman rank correlation between two columns.

Missing data will be excluded from the computation.

Parameters:

  • a (Object)

    Column name or Expression.

  • b (Object)

    Column name or Expression.

  • ddof (Integer) (defaults to: 1)

    Delta degrees of freedom

  • propagate_nans (Boolean) (defaults to: false)

    If True any NaN encountered will lead to NaN in the output. Defaults to False where NaN are regarded as larger than any finite number and thus lead to the highest rank.

Returns:



375
376
377
378
379
380
381
382
383
# File 'lib/polars/lazy_functions.rb', line 375

def spearman_rank_corr(a, b, ddof: 1, propagate_nans: false)
  if Utils.strlike?(a)
    a = col(a)
  end
  if Utils.strlike?(b)
    b = col(b)
  end
  Utils.wrap_expr(RbExpr.spearman_rank_corr(a._rbexpr, b._rbexpr, ddof, propagate_nans))
end

#std(column, ddof: 1) ⇒ Object

Get the standard deviation.

Returns:



88
89
90
91
92
93
94
# File 'lib/polars/lazy_functions.rb', line 88

def std(column, ddof: 1)
  if column.is_a?(Series)
    column.std(ddof: ddof)
  else
    col(column).std(ddof: ddof)
  end
end

#struct(exprs, eager: false) ⇒ Object

Collect several columns into a Series of dtype Struct.

Examples:

Polars::DataFrame.new(
  {
    "int" => [1, 2],
    "str" => ["a", "b"],
    "bool" => [true, nil],
    "list" => [[1, 2], [3]],
  }
).select([Polars.struct(Polars.all).alias("my_struct")])
# =>
# shape: (2, 1)
# ┌─────────────────────┐
# │ my_struct           │
# │ ---                 │
# │ struct[4]           │
# ╞═════════════════════╡
# │ {1,"a",true,[1, 2]} │
# │ {2,"b",null,[3]}    │
# └─────────────────────┘

Only collect specific columns as a struct:

df = Polars::DataFrame.new(
  {"a" => [1, 2, 3, 4], "b" => ["one", "two", "three", "four"], "c" => [9, 8, 7, 6]}
)
df.with_column(Polars.struct(Polars.col(["a", "b"])).alias("a_and_b"))
# =>
# shape: (4, 4)
# ┌─────┬───────┬─────┬─────────────┐
# │ a   ┆ b     ┆ c   ┆ a_and_b     │
# │ --- ┆ ---   ┆ --- ┆ ---         │
# │ i64 ┆ str   ┆ i64 ┆ struct[2]   │
# ╞═════╪═══════╪═════╪═════════════╡
# │ 1   ┆ one   ┆ 9   ┆ {1,"one"}   │
# │ 2   ┆ two   ┆ 8   ┆ {2,"two"}   │
# │ 3   ┆ three ┆ 7   ┆ {3,"three"} │
# │ 4   ┆ four  ┆ 6   ┆ {4,"four"}  │
# └─────┴───────┴─────┴─────────────┘

Parameters:

  • exprs (Object)

    Columns/Expressions to collect into a Struct

  • eager (Boolean) (defaults to: false)

    Evaluate immediately

Returns:



985
986
987
988
989
990
991
# File 'lib/polars/lazy_functions.rb', line 985

def struct(exprs, eager: false)
  if eager
    Polars.select(struct(exprs, eager: false)).to_series
  end
  exprs = Utils.selection_to_rbexpr_list(exprs)
  Utils.wrap_expr(_as_struct(exprs))
end

#sum(column) ⇒ Object

Sum values in a column/Series, or horizontally across list of columns/expressions.

Returns:



138
139
140
141
142
143
144
145
146
147
148
149
# File 'lib/polars/lazy_functions.rb', line 138

def sum(column)
  if column.is_a?(Series)
    column.sum
  elsif Utils.strlike?(column)
    col(column.to_s).sum
  elsif column.is_a?(::Array)
    exprs = Utils.selection_to_rbexpr_list(column)
    Utils.wrap_expr(_sum_horizontal(exprs))
  else
    fold(lit(0).cast(:u32), ->(a, b) { a + b }, column).alias("sum")
  end
end

#tail(column, n = 10) ⇒ Object

Get the last n rows.

Parameters:

  • column (Object)

    Column name or Series.

  • n (Integer) (defaults to: 10)

    Number of rows to return.

Returns:



258
259
260
261
262
263
264
# File 'lib/polars/lazy_functions.rb', line 258

def tail(column, n = 10)
  if column.is_a?(Series)
    column.tail(n)
  else
    col(column).tail(n)
  end
end

#to_list(name) ⇒ Expr

Aggregate to list.

Returns:



81
82
83
# File 'lib/polars/lazy_functions.rb', line 81

def to_list(name)
  col(name).list
end

#var(column, ddof: 1) ⇒ Object

Get the variance.

Returns:



99
100
101
102
103
104
105
# File 'lib/polars/lazy_functions.rb', line 99

def var(column, ddof: 1)
  if column.is_a?(Series)
    column.var(ddof: ddof)
  else
    col(column).var(ddof: ddof)
  end
end

#when(expr) ⇒ When

Start a "when, then, otherwise" expression.

Examples:

df = Polars::DataFrame.new({"foo" => [1, 3, 4], "bar" => [3, 4, 0]})
df.with_column(Polars.when(Polars.col("foo") > 2).then(Polars.lit(1)).otherwise(Polars.lit(-1)))
# =>
# shape: (3, 3)
# ┌─────┬─────┬─────────┐
# │ foo ┆ bar ┆ literal │
# │ --- ┆ --- ┆ ---     │
# │ i64 ┆ i64 ┆ i32     │
# ╞═════╪═════╪═════════╡
# │ 1   ┆ 3   ┆ -1      │
# │ 3   ┆ 4   ┆ 1       │
# │ 4   ┆ 0   ┆ 1       │
# └─────┴─────┴─────────┘

Returns:

  • (When)


1175
1176
1177
1178
1179
# File 'lib/polars/lazy_functions.rb', line 1175

def when(expr)
  expr = Utils.expr_to_lit_or_expr(expr)
  pw = RbExpr.when(expr._rbexpr)
  When.new(pw)
end