Module: Polars::Functions

Included in:
Polars
Defined in:
lib/polars/functions.rb

Instance Method Summary collapse

Instance Method Details

#align_frames(*frames, on:, select: nil, reverse: false) ⇒ Object

Align a sequence of frames using the uique values from one or more columns as a key.

Frames that do not contain the given key values have rows injected (with nulls filling the non-key columns), and each resulting frame is sorted by the key.

The original column order of input frames is not changed unless select is specified (in which case the final column order is determined from that).

Note that this does not result in a joined frame - you receive the same number of frames back that you passed in, but each is now aligned by key and has the same number of rows.

Examples:

df1 = Polars::DataFrame.new(
  {
    "dt" => [Date.new(2022, 9, 1), Date.new(2022, 9, 2), Date.new(2022, 9, 3)],
    "x" => [3.5, 4.0, 1.0],
    "y" => [10.0, 2.5, 1.5]
  }
)
df2 = Polars::DataFrame.new(
  {
    "dt" => [Date.new(2022, 9, 2), Date.new(2022, 9, 3), Date.new(2022, 9, 1)],
    "x" => [8.0, 1.0, 3.5],
    "y" => [1.5, 12.0, 5.0]
  }
)
df3 = Polars::DataFrame.new(
  {
    "dt" => [Date.new(2022, 9, 3), Date.new(2022, 9, 2)],
    "x" => [2.0, 5.0],
    "y" => [2.5, 2.0]
  }
)
af1, af2, af3 = Polars.align_frames(
  df1, df2, df3, on: "dt", select: ["x", "y"]
)
(af1 * af2 * af3).fill_null(0).select(Polars.sum(Polars.col("*")).alias("dot"))
# =>
# shape: (3, 1)
# ┌───────┐
# │ dot   │
# │ ---   │
# │ f64   │
# ╞═══════╡
# │ 0.0   │
# ├╌╌╌╌╌╌╌┤
# │ 167.5 │
# ├╌╌╌╌╌╌╌┤
# │ 47.0  │
# └───────┘

Parameters:

  • frames (Array)

    Sequence of DataFrames or LazyFrames.

  • on (Object)

    One or more columns whose unique values will be used to align the frames.

  • select (Object) (defaults to: nil)

    Optional post-alignment column select to constrain and/or order the columns returned from the newly aligned frames.

  • reverse (Object) (defaults to: false)

    Sort the alignment column values in descending order; can be a single boolean or a list of booleans associated with each column in on.

Returns:



359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
# File 'lib/polars/functions.rb', line 359

def align_frames(
  *frames,
  on:,
  select: nil,
  reverse: false
)
  if frames.empty?
    return []
  elsif frames.map(&:class).uniq.length != 1
    raise TypeError, "Input frames must be of a consistent type (all LazyFrame or all DataFrame)"
  end

  # establish the superset of all "on" column values, sort, and cache
  eager = frames[0].is_a?(DataFrame)
  alignment_frame = (
    concat(frames.map { |df| df.lazy.select(on) })
      .unique(maintain_order: false)
      .sort(on, reverse: reverse)
  )
  alignment_frame = (
    eager ? alignment_frame.collect.lazy : alignment_frame.cache
  )
  # finally, align all frames
  aligned_frames =
    frames.map do |df|
      alignment_frame.join(
        df.lazy,
        on: alignment_frame.columns,
        how: "left"
      ).select(df.columns)
    end
  if !select.nil?
    aligned_frames = aligned_frames.map { |df| df.select(select) }
  end

  eager ? aligned_frames.map(&:collect) : aligned_frames
end

#concat(items, rechunk: true, how: "vertical", parallel: true) ⇒ Object

Aggregate multiple Dataframes/Series to a single DataFrame/Series.

Examples:

df1 = Polars::DataFrame.new({"a" => [1], "b" => [3]})
df2 = Polars::DataFrame.new({"a" => [2], "b" => [4]})
Polars.concat([df1, df2])
# =>
# shape: (2, 2)
# ┌─────┬─────┐
# │ a   ┆ b   │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 1   ┆ 3   │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 2   ┆ 4   │
# └─────┴─────┘

Parameters:

  • items (Object)

    DataFrames/Series/LazyFrames to concatenate.

  • rechunk (Boolean) (defaults to: true)

    Make sure that all data is in contiguous memory.

  • how ("vertical", "diagonal", "horizontal") (defaults to: "vertical")

    Lazy only supports the 'vertical' strategy.

    • Vertical: applies multiple vstack operations.
    • Diagonal: finds a union between the column schemas and fills missing column values with null.
    • Horizontal: stacks Series horizontally and fills with nulls if the lengths don't match.
  • parallel (Boolean) (defaults to: true)

    Only relevant for LazyFrames. This determines if the concatenated lazy computations may be executed in parallel.

Returns:



49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# File 'lib/polars/functions.rb', line 49

def concat(items, rechunk: true, how: "vertical", parallel: true)
  if items.empty?
    raise ArgumentError, "cannot concat empty list"
  end

  first = items[0]
  if first.is_a?(DataFrame)
    if how == "vertical"
      out = Utils.wrap_df(_concat_df(items))
    elsif how == "diagonal"
      out = Utils.wrap_df(_diag_concat_df(items))
    elsif how == "horizontal"
      out = Utils.wrap_df(_hor_concat_df(items))
    else
      raise ArgumentError, "how must be one of {{'vertical', 'diagonal', 'horizontal'}}, got #{how}"
    end
  elsif first.is_a?(LazyFrame)
    if how == "vertical"
      # TODO
      return Utils.wrap_ldf(_concat_lf(items, rechunk, parallel))
    else
      raise ArgumentError, "Lazy only allows 'vertical' concat strategy."
    end
  elsif first.is_a?(Series)
    # TODO
    out = Utils.wrap_s(_concat_series(items))
  elsif first.is_a?(Expr)
    out = first
    items[1..-1].each do |e|
      out = out.append(e)
    end
  else
    raise ArgumentError, "did not expect type: #{first.class.name} in 'Polars.concat'."
  end

  if rechunk
    out.rechunk
  else
    out
  end
end

#date_range(low, high, interval, lazy: false, closed: "both", name: nil, time_unit: nil, time_zone: nil) ⇒ Object

Note:

If both low and high are passed as date types (not datetime), and the interval granularity is no finer than 1d, the returned range is also of type date. All other permutations return a datetime Series.

Create a range of type Datetime (or Date).

Examples:

Using polars duration string to specify the interval

Polars.date_range(Date.new(2022, 1, 1), Date.new(2022, 3, 1), "1mo", name: "drange")
# =>
# shape: (3,)
# Series: 'drange' [date]
# [
#         2022-01-01
#         2022-02-01
#         2022-03-01
# ]

Using timedelta object to specify the interval:

Polars.date_range(
    DateTime.new(1985, 1, 1),
    DateTime.new(1985, 1, 10),
    "1d12h",
    time_unit: "ms"
)
# =>
# shape: (7,)
# Series: '' [datetime[ms]]
# [
#         1985-01-01 00:00:00
#         1985-01-02 12:00:00
#         1985-01-04 00:00:00
#         1985-01-05 12:00:00
#         1985-01-07 00:00:00
#         1985-01-08 12:00:00
#         1985-01-10 00:00:00
# ]

Parameters:

  • low (Object)

    Lower bound of the date range.

  • high (Object)

    Upper bound of the date range.

  • interval (Object)

    Interval periods. It can be a polars duration string, such as 3d12h4m25s representing 3 days, 12 hours, 4 minutes, and 25 seconds.

  • lazy (Boolean) (defaults to: false)

    Return an expression.

  • closed ("both", "left", "right", "none") (defaults to: "both")

    Define whether the temporal window interval is closed or not.

  • name (String) (defaults to: nil)

    Name of the output Series.

  • time_unit (nil, "ns", "us", "ms") (defaults to: nil)

    Set the time unit.

  • time_zone (String) (defaults to: nil)

    Optional timezone

Returns:



148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
# File 'lib/polars/functions.rb', line 148

def date_range(
  low,
  high,
  interval,
  lazy: false,
  closed: "both",
  name: nil,
  time_unit: nil,
  time_zone: nil
)
  if defined?(ActiveSupport::Duration) && interval.is_a?(ActiveSupport::Duration)
    raise Todo
  else
    interval = interval.to_s
    if interval.include?(" ")
      interval = interval.gsub(" ", "")
    end
  end

  if low.is_a?(Expr) || high.is_a?(Expr) || lazy
    low = Utils.expr_to_lit_or_expr(low, str_to_lit: true)
    high = Utils.expr_to_lit_or_expr(high, str_to_lit: true)
    return Utils.wrap_expr(
      _rb_date_range_lazy(low, high, interval, closed, name, time_zone)
    )
  end

  low, low_is_date = _ensure_datetime(low)
  high, high_is_date = _ensure_datetime(high)

  if !time_unit.nil?
    tu = time_unit
  elsif interval.include?("ns")
    tu = "ns"
  else
    tu = "us"
  end

  start = Utils._datetime_to_pl_timestamp(low, tu)
  stop = Utils._datetime_to_pl_timestamp(high, tu)
  if name.nil?
    name = ""
  end

  dt_range = Utils.wrap_s(
    _rb_date_range(start, stop, interval, closed, name, tu, time_zone)
  )
  if low_is_date && high_is_date && !["h", "m", "s"].any? { |v| _interval_granularity(interval).end_with?(v) }
    dt_range = dt_range.cast(Date)
  end

  dt_range
end

#get_dummies(df, columns: nil) ⇒ DataFrame

Convert categorical variables into dummy/indicator variables.

Parameters:

  • df (DataFrame)

    DataFrame to convert.

  • columns (Array, nil) (defaults to: nil)

    A subset of columns to convert to dummy variables. nil means "all columns".

Returns:



12
13
14
# File 'lib/polars/functions.rb', line 12

def get_dummies(df, columns: nil)
  df.to_dummies(columns: columns)
end

#ones(n, dtype: nil) ⇒ Series

Note:

In the lazy API you should probably not use this, but use lit(1) instead.

Return a new Series of given length and type, filled with ones.

Parameters:

  • n (Integer)

    Number of elements in the Series

  • dtype (Symbol) (defaults to: nil)

    DataType of the elements, defaults to :f64

Returns:



409
410
411
412
413
414
415
# File 'lib/polars/functions.rb', line 409

def ones(n, dtype: nil)
  s = Series.new([1.0])
  if dtype
    s = s.cast(dtype)
  end
  s.new_from_index(0, n)
end

#zeros(n, dtype: nil) ⇒ Series

Note:

In the lazy API you should probably not use this, but use lit(0) instead.

Return a new Series of given length and type, filled with zeros.

Parameters:

  • n (Integer)

    Number of elements in the Series

  • dtype (Symbol) (defaults to: nil)

    DataType of the elements, defaults to :f64

Returns:



429
430
431
432
433
434
435
# File 'lib/polars/functions.rb', line 429

def zeros(n, dtype: nil)
  s = Series.new([0.0])
  if dtype
    s = s.cast(dtype)
  end
  s.new_from_index(0, n)
end