Module: Polars

Extended by:
Convert, Functions, IO, LazyFunctions
Defined in:
lib/polars.rb,
lib/polars/io.rb,
lib/polars/expr.rb,
lib/polars/plot.rb,
lib/polars/when.rb,
lib/polars/slice.rb,
lib/polars/utils.rb,
lib/polars/config.rb,
lib/polars/series.rb,
lib/polars/convert.rb,
lib/polars/version.rb,
lib/polars/cat_expr.rb,
lib/polars/group_by.rb,
lib/polars/functions.rb,
lib/polars/list_expr.rb,
lib/polars/meta_expr.rb,
lib/polars/name_expr.rb,
lib/polars/when_then.rb,
lib/polars/array_expr.rb,
lib/polars/data_frame.rb,
lib/polars/data_types.rb,
lib/polars/exceptions.rb,
lib/polars/lazy_frame.rb,
lib/polars/binary_expr.rb,
lib/polars/sql_context.rb,
lib/polars/string_expr.rb,
lib/polars/struct_expr.rb,
lib/polars/expr_dispatch.rb,
lib/polars/lazy_group_by.rb,
lib/polars/cat_name_space.rb,
lib/polars/date_time_expr.rb,
lib/polars/lazy_functions.rb,
lib/polars/list_name_space.rb,
lib/polars/array_name_space.rb,
lib/polars/dynamic_group_by.rb,
lib/polars/rolling_group_by.rb,
lib/polars/binary_name_space.rb,
lib/polars/string_name_space.rb,
lib/polars/struct_name_space.rb,
lib/polars/batched_csv_reader.rb,
lib/polars/date_time_name_space.rb

Defined Under Namespace

Modules: Convert, Functions, IO, LazyFunctions, Plot Classes: Array, ArrayExpr, ArrayNameSpace, Binary, BinaryExpr, BinaryNameSpace, Boolean, CatExpr, CatNameSpace, Categorical, Config, DataFrame, DataType, Date, DateTimeExpr, DateTimeNameSpace, Datetime, Decimal, Duration, DynamicGroupBy, Expr, Field, Float32, Float64, FloatType, FractionalType, GroupBy, Int16, Int32, Int64, Int8, IntegralType, LazyFrame, LazyGroupBy, List, ListExpr, ListNameSpace, MetaExpr, NameExpr, NestedType, Null, NumericType, Object, RollingGroupBy, SQLContext, Series, String, StringExpr, StringNameSpace, Struct, StructExpr, StructNameSpace, TemporalType, Time, UInt16, UInt32, UInt64, UInt8, Unknown

Constant Summary collapse

Utf8 =

Allow Utf8 as an alias for String

String

Class Method Summary collapse

Class Method Details

.align_frames(*frames, on:, select: nil, reverse: false) ⇒ Object Originally defined in module Functions

Align a sequence of frames using the uique values from one or more columns as a key.

Frames that do not contain the given key values have rows injected (with nulls filling the non-key columns), and each resulting frame is sorted by the key.

The original column order of input frames is not changed unless select is specified (in which case the final column order is determined from that).

Note that this does not result in a joined frame - you receive the same number of frames back that you passed in, but each is now aligned by key and has the same number of rows.

Examples:

df1 = Polars::DataFrame.new(
  {
    "dt" => [Date.new(2022, 9, 1), Date.new(2022, 9, 2), Date.new(2022, 9, 3)],
    "x" => [3.5, 4.0, 1.0],
    "y" => [10.0, 2.5, 1.5]
  }
)
df2 = Polars::DataFrame.new(
  {
    "dt" => [Date.new(2022, 9, 2), Date.new(2022, 9, 3), Date.new(2022, 9, 1)],
    "x" => [8.0, 1.0, 3.5],
    "y" => [1.5, 12.0, 5.0]
  }
)
df3 = Polars::DataFrame.new(
  {
    "dt" => [Date.new(2022, 9, 3), Date.new(2022, 9, 2)],
    "x" => [2.0, 5.0],
    "y" => [2.5, 2.0]
  }
)
af1, af2, af3 = Polars.align_frames(
  df1, df2, df3, on: "dt", select: ["x", "y"]
)
(af1 * af2 * af3).fill_null(0).select(Polars.sum(Polars.col("*")).alias("dot"))
# =>
# shape: (3, 1)
# ┌───────┐
# │ dot   │
# │ ---   │
# │ f64   │
# ╞═══════╡
# │ 0.0   │
# ├╌╌╌╌╌╌╌┤
# │ 167.5 │
# ├╌╌╌╌╌╌╌┤
# │ 47.0  │
# └───────┘

.all(name = nil) ⇒ Expr Originally defined in module LazyFunctions

Do one of two things.

  • function can do a columnwise or elementwise AND operation
  • a wildcard column selection

Examples:

Sum all columns

df = Polars::DataFrame.new(
  {"a" => [1, 2, 3], "b" => ["hello", "foo", "bar"], "c" => [1, 1, 1]}
)
df.select(Polars.all.sum)
# =>
# shape: (1, 3)
# ┌─────┬──────┬─────┐
# │ a   ┆ b    ┆ c   │
# │ --- ┆ ---  ┆ --- │
# │ i64 ┆ str  ┆ i64 │
# ╞═════╪══════╪═════╡
# │ 6   ┆ null ┆ 3   │
# └─────┴──────┴─────┘

.any(name) ⇒ Expr Originally defined in module LazyFunctions

Evaluate columnwise or elementwise with a bitwise OR operation.

.arg_sort_by(exprs, reverse: false) ⇒ Expr Also known as: argsort_by Originally defined in module LazyFunctions

Find the indexes that would sort the columns.

Argsort by multiple columns. The first column will be used for the ordering. If there are duplicates in the first column, the second column will be used to determine the ordering and so on.

.arg_where(condition, eager: false) ⇒ Expr, Series Originally defined in module LazyFunctions

Return indices where condition evaluates true.

Examples:

df = Polars::DataFrame.new({"a" => [1, 2, 3, 4, 5]})
df.select(
  [
    Polars.arg_where(Polars.col("a") % 2 == 0)
  ]
).to_series
# =>
# shape: (2,)
# Series: 'a' [u32]
# [
#         1
#         3
# ]

.avg(column) ⇒ Expr, Float Originally defined in module LazyFunctions

Get the mean value.

.coalesce(exprs, *more_exprs) ⇒ Expr Originally defined in module LazyFunctions

Folds the expressions from left to right, keeping the first non-null value.

Examples:

df = Polars::DataFrame.new(
  [
    [nil, 1.0, 1.0],
    [nil, 2.0, 2.0],
    [nil, nil, 3.0],
    [nil, nil, nil]
  ],
  columns: [["a", :f64], ["b", :f64], ["c", :f64]]
)
df.with_column(Polars.coalesce(["a", "b", "c", 99.9]).alias("d"))
# =>
# shape: (4, 4)
# ┌──────┬──────┬──────┬──────┐
# │ a    ┆ b    ┆ c    ┆ d    │
# │ ---  ┆ ---  ┆ ---  ┆ ---  │
# │ f64  ┆ f64  ┆ f64  ┆ f64  │
# ╞══════╪══════╪══════╪══════╡
# │ null ┆ 1.0  ┆ 1.0  ┆ 1.0  │
# │ null ┆ 2.0  ┆ 2.0  ┆ 2.0  │
# │ null ┆ null ┆ 3.0  ┆ 3.0  │
# │ null ┆ null ┆ null ┆ 99.9 │
# └──────┴──────┴──────┴──────┘

.col(name) ⇒ Expr Originally defined in module LazyFunctions

Return an expression representing a column in a DataFrame.

.collect_all(lazy_frames, type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ Array Originally defined in module LazyFunctions

Collect multiple LazyFrames at the same time.

This runs all the computation graphs in parallel on Polars threadpool.

.concat(items, rechunk: true, how: "vertical", parallel: true) ⇒ Object Originally defined in module Functions

Aggregate multiple Dataframes/Series to a single DataFrame/Series.

Examples:

df1 = Polars::DataFrame.new({"a" => [1], "b" => [3]})
df2 = Polars::DataFrame.new({"a" => [2], "b" => [4]})
Polars.concat([df1, df2])
# =>
# shape: (2, 2)
# ┌─────┬─────┐
# │ a   ┆ b   │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 1   ┆ 3   │
# │ 2   ┆ 4   │
# └─────┴─────┘

.concat_list(exprs) ⇒ Expr Originally defined in module LazyFunctions

Concat the arrays in a Series dtype List in linear time.

.concat_str(exprs, sep: "") ⇒ Expr Originally defined in module LazyFunctions

Horizontally concat Utf8 Series in linear time. Non-Utf8 columns are cast to Utf8.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [1, 2, 3],
    "b" => ["dogs", "cats", nil],
    "c" => ["play", "swim", "walk"]
  }
)
df.with_columns(
  [
    Polars.concat_str(
      [
        Polars.col("a") * 2,
        Polars.col("b"),
        Polars.col("c")
      ],
      sep: " "
    ).alias("full_sentence")
  ]
)
# =>
# shape: (3, 4)
# ┌─────┬──────┬──────┬───────────────┐
# │ a   ┆ b    ┆ c    ┆ full_sentence │
# │ --- ┆ ---  ┆ ---  ┆ ---           │
# │ i64 ┆ str  ┆ str  ┆ str           │
# ╞═════╪══════╪══════╪═══════════════╡
# │ 1   ┆ dogs ┆ play ┆ 2 dogs play   │
# │ 2   ┆ cats ┆ swim ┆ 4 cats swim   │
# │ 3   ┆ null ┆ walk ┆ null          │
# └─────┴──────┴──────┴───────────────┘

.count(column = nil) ⇒ Expr, Integer Originally defined in module LazyFunctions

Count the number of values in this column/context.

.cov(a, b) ⇒ Expr Originally defined in module LazyFunctions

Compute the covariance between two columns/ expressions.

.cumfold(acc, f, exprs, include_init: false) ⇒ Object Originally defined in module LazyFunctions

Note:

If you simply want the first encountered expression as accumulator, consider using cumreduce.

Cumulatively accumulate over multiple columns horizontally/row wise with a left fold.

Every cumulative result is added as a separate field in a Struct column.

.cumsum(column) ⇒ Object Originally defined in module LazyFunctions

Cumulatively sum values in a column/Series, or horizontally across list of columns/expressions.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [1, 2],
    "b" => [3, 4],
    "c" => [5, 6]
  }
)
# =>
# shape: (2, 3)
# ┌─────┬─────┬─────┐
# │ a   ┆ b   ┆ c   │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞═════╪═════╪═════╡
# │ 1   ┆ 3   ┆ 5   │
# │ 2   ┆ 4   ┆ 6   │
# └─────┴─────┴─────┘

Cumulatively sum a column by name:

df.select(Polars.cumsum("a"))
# =>
# shape: (2, 1)
# ┌─────┐
# │ a   │
# │ --- │
# │ i64 │
# ╞═════╡
# │ 1   │
# │ 3   │
# └─────┘

Cumulatively sum a list of columns/expressions horizontally:

df.with_column(Polars.cumsum(["a", "c"]))
# =>
# shape: (2, 4)
# ┌─────┬─────┬─────┬───────────┐
# │ a   ┆ b   ┆ c   ┆ cumsum    │
# │ --- ┆ --- ┆ --- ┆ ---       │
# │ i64 ┆ i64 ┆ i64 ┆ struct[2] │
# ╞═════╪═════╪═════╪═══════════╡
# │ 1   ┆ 3   ┆ 5   ┆ {1,6}     │
# │ 2   ┆ 4   ┆ 6   ┆ {2,8}     │
# └─────┴─────┴─────┴───────────┘

.date_range(start, stop, interval, lazy: false, closed: "both", name: nil, time_unit: nil, time_zone: nil) ⇒ Object Originally defined in module Functions

Note:

If both low and high are passed as date types (not datetime), and the interval granularity is no finer than 1d, the returned range is also of type date. All other permutations return a datetime Series.

Create a range of type Datetime (or Date).

Examples:

Using polars duration string to specify the interval

Polars.date_range(Date.new(2022, 1, 1), Date.new(2022, 3, 1), "1mo", name: "drange")
# =>
# shape: (3,)
# Series: 'drange' [date]
# [
#         2022-01-01
#         2022-02-01
#         2022-03-01
# ]

Using timedelta object to specify the interval:

Polars.date_range(
    DateTime.new(1985, 1, 1),
    DateTime.new(1985, 1, 10),
    "1d12h",
    time_unit: "ms"
)
# =>
# shape: (7,)
# Series: '' [datetime[ms]]
# [
#         1985-01-01 00:00:00
#         1985-01-02 12:00:00
#         1985-01-04 00:00:00
#         1985-01-05 12:00:00
#         1985-01-07 00:00:00
#         1985-01-08 12:00:00
#         1985-01-10 00:00:00
# ]

.duration(weeks: nil, days: nil, hours: nil, minutes: nil, seconds: nil, milliseconds: nil, microseconds: nil, nanoseconds: nil, time_unit: "us") ⇒ Expr Originally defined in module LazyFunctions

Create polars Duration from distinct time components.

Examples:

df = Polars::DataFrame.new(
  {
    "datetime" => [DateTime.new(2022, 1, 1), DateTime.new(2022, 1, 2)],
    "add" => [1, 2]
  }
)
df.select(
  [
    (Polars.col("datetime") + Polars.duration(weeks: "add")).alias("add_weeks"),
    (Polars.col("datetime") + Polars.duration(days: "add")).alias("add_days"),
    (Polars.col("datetime") + Polars.duration(seconds: "add")).alias("add_seconds"),
    (Polars.col("datetime") + Polars.duration(milliseconds: "add")).alias(
      "add_milliseconds"
    ),
    (Polars.col("datetime") + Polars.duration(hours: "add")).alias("add_hours")
  ]
)
# =>
# shape: (2, 5)
# ┌─────────────────────┬─────────────────────┬─────────────────────┬─────────────────────────┬─────────────────────┐
# │ add_weeks           ┆ add_days            ┆ add_seconds         ┆ add_milliseconds        ┆ add_hours           │
# │ ---                 ┆ ---                 ┆ ---                 ┆ ---                     ┆ ---                 │
# │ datetime[ns]        ┆ datetime[ns]        ┆ datetime[ns]        ┆ datetime[ns]            ┆ datetime[ns]        │
# ╞═════════════════════╪═════════════════════╪═════════════════════╪═════════════════════════╪═════════════════════╡
# │ 2022-01-08 00:00:00 ┆ 2022-01-02 00:00:00 ┆ 2022-01-01 00:00:01 ┆ 2022-01-01 00:00:00.001 ┆ 2022-01-01 01:00:00 │
# │ 2022-01-16 00:00:00 ┆ 2022-01-04 00:00:00 ┆ 2022-01-02 00:00:02 ┆ 2022-01-02 00:00:00.002 ┆ 2022-01-02 02:00:00 │
# └─────────────────────┴─────────────────────┴─────────────────────┴─────────────────────────┴─────────────────────┘

.elementExpr Originally defined in module LazyFunctions

Alias for an element in evaluated in an eval expression.

Examples:

A horizontal rank computation by taking the elements of a list

df = Polars::DataFrame.new({"a" => [1, 8, 3], "b" => [4, 5, 2]})
df.with_column(
  Polars.concat_list(["a", "b"]).list.eval(Polars.element.rank).alias("rank")
)
# =>
# shape: (3, 3)
# ┌─────┬─────┬────────────┐
# │ a   ┆ b   ┆ rank       │
# │ --- ┆ --- ┆ ---        │
# │ i64 ┆ i64 ┆ list[f64]  │
# ╞═════╪═════╪════════════╡
# │ 1   ┆ 4   ┆ [1.0, 2.0] │
# │ 8   ┆ 5   ┆ [2.0, 1.0] │
# │ 3   ┆ 2   ┆ [2.0, 1.0] │
# └─────┴─────┴────────────┘

.exclude(columns) ⇒ Object Originally defined in module LazyFunctions

Exclude certain columns from a wildcard/regex selection.

Examples:

df = Polars::DataFrame.new(
  {
    "aa" => [1, 2, 3],
    "ba" => ["a", "b", nil],
    "cc" => [nil, 2.5, 1.5]
  }
)
# =>
# shape: (3, 3)
# ┌─────┬──────┬──────┐
# │ aa  ┆ ba   ┆ cc   │
# │ --- ┆ ---  ┆ ---  │
# │ i64 ┆ str  ┆ f64  │
# ╞═════╪══════╪══════╡
# │ 1   ┆ a    ┆ null │
# │ 2   ┆ b    ┆ 2.5  │
# │ 3   ┆ null ┆ 1.5  │
# └─────┴──────┴──────┘

Exclude by column name(s):

df.select(Polars.exclude("ba"))
# =>
# shape: (3, 2)
# ┌─────┬──────┐
# │ aa  ┆ cc   │
# │ --- ┆ ---  │
# │ i64 ┆ f64  │
# ╞═════╪══════╡
# │ 1   ┆ null │
# │ 2   ┆ 2.5  │
# │ 3   ┆ 1.5  │
# └─────┴──────┘

Exclude by regex, e.g. removing all columns whose names end with the letter "a":

df.select(Polars.exclude("^.*a$"))
# =>
# shape: (3, 1)
# ┌──────┐
# │ cc   │
# │ ---  │
# │ f64  │
# ╞══════╡
# │ null │
# │ 2.5  │
# │ 1.5  │
# └──────┘

.first(column = nil) ⇒ Object Originally defined in module LazyFunctions

Get the first value.

.fold(acc, f, exprs) ⇒ Expr Originally defined in module LazyFunctions

Accumulate over multiple columns horizontally/row wise with a left fold.

.format(fstring, *args) ⇒ Expr Originally defined in module LazyFunctions

Format expressions as a string.

Examples:

df = Polars::DataFrame.new(
  {
    "a": ["a", "b", "c"],
    "b": [1, 2, 3]
  }
)
df.select(
  [
    Polars.format("foo_{}_bar_{}", Polars.col("a"), "b").alias("fmt")
  ]
)
# =>
# shape: (3, 1)
# ┌─────────────┐
# │ fmt         │
# │ ---         │
# │ str         │
# ╞═════════════╡
# │ foo_a_bar_1 │
# │ foo_b_bar_2 │
# │ foo_c_bar_3 │
# └─────────────┘

.from_epoch(column, unit: "s", eager: false) ⇒ Object Originally defined in module LazyFunctions

Utility function that parses an epoch timestamp (or Unix time) to Polars Date(time).

Depending on the unit provided, this function will return a different dtype:

  • unit: "d" returns pl.Date
  • unit: "s" returns pl.Datetime"us"
  • unit: "ms" returns pl.Datetime["ms"]
  • unit: "us" returns pl.Datetime["us"]
  • unit: "ns" returns pl.Datetime["ns"]

Examples:

df = Polars::DataFrame.new({"timestamp" => [1666683077, 1666683099]}).lazy
df.select(Polars.from_epoch(Polars.col("timestamp"), unit: "s")).collect
# =>
# shape: (2, 1)
# ┌─────────────────────┐
# │ timestamp           │
# │ ---                 │
# │ datetime[μs]        │
# ╞═════════════════════╡
# │ 2022-10-25 07:31:17 │
# │ 2022-10-25 07:31:39 │
# └─────────────────────┘

.from_hash(data, schema: nil, columns: nil) ⇒ DataFrame Originally defined in module Convert

Construct a DataFrame from a dictionary of sequences.

This operation clones data, unless you pass in a Hash<String, Series>.

Examples:

data = {"a" => [1, 2], "b" => [3, 4]}
Polars.from_hash(data)
# =>
# shape: (2, 2)
# ┌─────┬─────┐
# │ a   ┆ b   │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 1   ┆ 3   │
# │ 2   ┆ 4   │
# └─────┴─────┘

.get_dummies(df, columns: nil) ⇒ DataFrame Originally defined in module Functions

Convert categorical variables into dummy/indicator variables.

.groups(column) ⇒ Object Originally defined in module LazyFunctions

Syntactic sugar for Polars.col("foo").agg_groups.

.head(column, n = 10) ⇒ Object Originally defined in module LazyFunctions

Get the first n rows.

.int_range(start, stop, step: 1, eager: false, dtype: nil) ⇒ Expr, Series Also known as: arange Originally defined in module LazyFunctions

Create a range expression (or Series).

This can be used in a select, with_column, etc. Be sure that the resulting range size is equal to the length of the DataFrame you are collecting.

Examples:

Polars.arange(0, 3, eager: true)
# =>
# shape: (3,)
# Series: 'arange' [i64]
# [
#         0
#         1
#         2
# ]

.last(column = nil) ⇒ Object Originally defined in module LazyFunctions

Get the last value.

Depending on the input type this function does different things:

  • nil -> expression to take last column of a context.
  • String -> syntactic sugar for Polars.col(..).last
  • Series -> Take last value in Series

.lit(value, dtype: nil, allow_object: nil) ⇒ Expr Originally defined in module LazyFunctions

Return an expression representing a literal value.

.max(column) ⇒ Expr, Object Originally defined in module LazyFunctions

Get the maximum value.

.mean(column) ⇒ Expr, Float Originally defined in module LazyFunctions

Get the mean value.

.median(column) ⇒ Object Originally defined in module LazyFunctions

Get the median value.

.min(column) ⇒ Expr, Object Originally defined in module LazyFunctions

Get the minimum value.

.n_unique(column) ⇒ Object Originally defined in module LazyFunctions

Count unique values.

.ones(n, dtype: nil) ⇒ Series Originally defined in module Functions

Note:

In the lazy API you should probably not use this, but use lit(1) instead.

Return a new Series of given length and type, filled with ones.

.pearson_corr(a, b, ddof: 1) ⇒ Expr Originally defined in module LazyFunctions

Compute the pearson's correlation between two columns.

.quantile(column, quantile, interpolation: "nearest") ⇒ Expr Originally defined in module LazyFunctions

Syntactic sugar for Polars.col("foo").quantile(...).

.read_avro(source, columns: nil, n_rows: nil) ⇒ DataFrame Originally defined in module IO

Read into a DataFrame from Apache Avro format.

.read_csv(source, has_header: true, columns: nil, new_columns: nil, sep: ",", comment_char: nil, quote_char: '"', skip_rows: 0, dtypes: nil, null_values: nil, ignore_errors: false, parse_dates: false, n_threads: nil, infer_schema_length: 100, batch_size: 8192, n_rows: nil, encoding: "utf8", low_memory: false, rechunk: true, storage_options: nil, skip_rows_after_header: 0, row_count_name: nil, row_count_offset: 0, sample_size: 1024, eol_char: "\n") ⇒ DataFrame Originally defined in module IO

Note:

This operation defaults to a rechunk operation at the end, meaning that all data will be stored continuously in memory. Set rechunk: false if you are benchmarking the csv-reader. A rechunk is an expensive operation.

Read a CSV file into a DataFrame.

.read_csv_batched(source, has_header: true, columns: nil, new_columns: nil, sep: ",", comment_char: nil, quote_char: '"', skip_rows: 0, dtypes: nil, null_values: nil, ignore_errors: false, parse_dates: false, n_threads: nil, infer_schema_length: 100, batch_size: 50_000, n_rows: nil, encoding: "utf8", low_memory: false, rechunk: true, skip_rows_after_header: 0, row_count_name: nil, row_count_offset: 0, sample_size: 1024, eol_char: "\n") ⇒ BatchedCsvReader Originally defined in module IO

Read a CSV file in batches.

Upon creation of the BatchedCsvReader, polars will gather statistics and determine the file chunks. After that work will only be done if next_batches is called.

Examples:

reader = Polars.read_csv_batched(
  "./tpch/tables_scale_100/lineitem.tbl", sep: "|", parse_dates: true
)
reader.next_batches(5)

.read_database(query) ⇒ DataFrame Also known as: read_sql Originally defined in module IO

Read a SQL query into a DataFrame.

.read_ipc(source, columns: nil, n_rows: nil, memory_map: true, storage_options: nil, row_count_name: nil, row_count_offset: 0, rechunk: true) ⇒ DataFrame Originally defined in module IO

Read into a DataFrame from Arrow IPC (Feather v2) file.

.read_ipc_schema(source) ⇒ Hash Originally defined in module IO

Get a schema of the IPC file without reading data.

.read_json(source) ⇒ DataFrame Originally defined in module IO

Read into a DataFrame from a JSON file.

.read_ndjson(source) ⇒ DataFrame Originally defined in module IO

Read into a DataFrame from a newline delimited JSON file.

.read_parquet(source, columns: nil, n_rows: nil, storage_options: nil, parallel: "auto", row_count_name: nil, row_count_offset: 0, low_memory: false, use_statistics: true, rechunk: true) ⇒ DataFrame Originally defined in module IO

Note:

This operation defaults to a rechunk operation at the end, meaning that all data will be stored continuously in memory. Set rechunk: false if you are benchmarking the parquet-reader. A rechunk is an expensive operation.

Read into a DataFrame from a parquet file.

.read_parquet_schema(source) ⇒ Hash Originally defined in module IO

Get a schema of the Parquet file without reading data.

.repeat(value, n, dtype: nil, eager: false, name: nil) ⇒ Expr Originally defined in module LazyFunctions

Repeat a single value n times.

.scan_csv(source, has_header: true, sep: ",", comment_char: nil, quote_char: '"', skip_rows: 0, dtypes: nil, null_values: nil, ignore_errors: false, cache: true, with_column_names: nil, infer_schema_length: 100, n_rows: nil, encoding: "utf8", low_memory: false, rechunk: true, skip_rows_after_header: 0, row_count_name: nil, row_count_offset: 0, parse_dates: false, eol_char: "\n") ⇒ LazyFrame Originally defined in module IO

Lazily read from a CSV file or multiple files via glob patterns.

This allows the query optimizer to push down predicates and projections to the scan level, thereby potentially reducing memory overhead.

.scan_ipc(source, n_rows: nil, cache: true, rechunk: true, row_count_name: nil, row_count_offset: 0, storage_options: nil, memory_map: true) ⇒ LazyFrame Originally defined in module IO

Lazily read from an Arrow IPC (Feather v2) file or multiple files via glob patterns.

This allows the query optimizer to push down predicates and projections to the scan level, thereby potentially reducing memory overhead.

.scan_ndjson(source, infer_schema_length: 100, batch_size: 1024, n_rows: nil, low_memory: false, rechunk: true, row_count_name: nil, row_count_offset: 0) ⇒ LazyFrame Originally defined in module IO

Lazily read from a newline delimited JSON file.

This allows the query optimizer to push down predicates and projections to the scan level, thereby potentially reducing memory overhead.

.scan_parquet(source, n_rows: nil, cache: true, parallel: "auto", rechunk: true, row_count_name: nil, row_count_offset: 0, storage_options: nil, low_memory: false) ⇒ LazyFrame Originally defined in module IO

Lazily read from a parquet file or multiple files via glob patterns.

This allows the query optimizer to push down predicates and projections to the scan level, thereby potentially reducing memory overhead.

.select(exprs) ⇒ DataFrame Originally defined in module LazyFunctions

Run polars expressions without a context.

.spearman_rank_corr(a, b, ddof: 1, propagate_nans: false) ⇒ Expr Originally defined in module LazyFunctions

Compute the spearman rank correlation between two columns.

Missing data will be excluded from the computation.

.std(column, ddof: 1) ⇒ Object Originally defined in module LazyFunctions

Get the standard deviation.

.struct(exprs, eager: false) ⇒ Object Originally defined in module LazyFunctions

Collect several columns into a Series of dtype Struct.

Examples:

Polars::DataFrame.new(
  {
    "int" => [1, 2],
    "str" => ["a", "b"],
    "bool" => [true, nil],
    "list" => [[1, 2], [3]],
  }
).select([Polars.struct(Polars.all).alias("my_struct")])
# =>
# shape: (2, 1)
# ┌─────────────────────┐
# │ my_struct           │
# │ ---                 │
# │ struct[4]           │
# ╞═════════════════════╡
# │ {1,"a",true,[1, 2]} │
# │ {2,"b",null,[3]}    │
# └─────────────────────┘

Only collect specific columns as a struct:

df = Polars::DataFrame.new(
  {"a" => [1, 2, 3, 4], "b" => ["one", "two", "three", "four"], "c" => [9, 8, 7, 6]}
)
df.with_column(Polars.struct(Polars.col(["a", "b"])).alias("a_and_b"))
# =>
# shape: (4, 4)
# ┌─────┬───────┬─────┬─────────────┐
# │ a   ┆ b     ┆ c   ┆ a_and_b     │
# │ --- ┆ ---   ┆ --- ┆ ---         │
# │ i64 ┆ str   ┆ i64 ┆ struct[2]   │
# ╞═════╪═══════╪═════╪═════════════╡
# │ 1   ┆ one   ┆ 9   ┆ {1,"one"}   │
# │ 2   ┆ two   ┆ 8   ┆ {2,"two"}   │
# │ 3   ┆ three ┆ 7   ┆ {3,"three"} │
# │ 4   ┆ four  ┆ 6   ┆ {4,"four"}  │
# └─────┴───────┴─────┴─────────────┘

.sum(column) ⇒ Object Originally defined in module LazyFunctions

Sum values in a column/Series, or horizontally across list of columns/expressions.

.tail(column, n = 10) ⇒ Object Originally defined in module LazyFunctions

Get the last n rows.

.to_list(name) ⇒ Expr Originally defined in module LazyFunctions

Aggregate to list.

.var(column, ddof: 1) ⇒ Object Originally defined in module LazyFunctions

Get the variance.

.when(expr) ⇒ When Originally defined in module LazyFunctions

Start a "when, then, otherwise" expression.

Examples:

df = Polars::DataFrame.new({"foo" => [1, 3, 4], "bar" => [3, 4, 0]})
df.with_column(Polars.when(Polars.col("foo") > 2).then(Polars.lit(1)).otherwise(Polars.lit(-1)))
# =>
# shape: (3, 3)
# ┌─────┬─────┬─────────┐
# │ foo ┆ bar ┆ literal │
# │ --- ┆ --- ┆ ---     │
# │ i64 ┆ i64 ┆ i32     │
# ╞═════╪═════╪═════════╡
# │ 1   ┆ 3   ┆ -1      │
# │ 3   ┆ 4   ┆ 1       │
# │ 4   ┆ 0   ┆ 1       │
# └─────┴─────┴─────────┘

.zeros(n, dtype: nil) ⇒ Series Originally defined in module Functions

Note:

In the lazy API you should probably not use this, but use lit(0) instead.

Return a new Series of given length and type, filled with zeros.