Module: Polars

Extended by:: Convert, Functions, IO, LazyFunctions

Defined in:: lib/polars.rb,
lib/polars/io.rb,
lib/polars/expr.rb,
lib/polars/plot.rb,
lib/polars/when.rb,
lib/polars/slice.rb,
lib/polars/utils.rb,
lib/polars/config.rb,
lib/polars/series.rb,
lib/polars/convert.rb,
lib/polars/version.rb,
lib/polars/cat_expr.rb,
lib/polars/group_by.rb,
lib/polars/functions.rb,
lib/polars/list_expr.rb,
lib/polars/meta_expr.rb,
lib/polars/name_expr.rb,
lib/polars/when_then.rb,
lib/polars/array_expr.rb,
lib/polars/data_frame.rb,
lib/polars/data_types.rb,
lib/polars/exceptions.rb,
lib/polars/lazy_frame.rb,
lib/polars/binary_expr.rb,
lib/polars/sql_context.rb,
lib/polars/string_expr.rb,
lib/polars/struct_expr.rb,
lib/polars/expr_dispatch.rb,
lib/polars/lazy_group_by.rb,
lib/polars/cat_name_space.rb,
lib/polars/date_time_expr.rb,
lib/polars/lazy_functions.rb,
lib/polars/list_name_space.rb,
lib/polars/array_name_space.rb,
lib/polars/dynamic_group_by.rb,
lib/polars/rolling_group_by.rb,
lib/polars/binary_name_space.rb,
lib/polars/string_name_space.rb,
lib/polars/struct_name_space.rb,
lib/polars/batched_csv_reader.rb,
lib/polars/date_time_name_space.rb

Defined Under Namespace

Modules: Convert, Functions, IO, LazyFunctions, Plot Classes: Array, ArrayExpr, ArrayNameSpace, Binary, BinaryExpr, BinaryNameSpace, Boolean, CatExpr, CatNameSpace, Categorical, Config, DataFrame, DataType, Date, DateTimeExpr, DateTimeNameSpace, Datetime, Decimal, Duration, DynamicGroupBy, Expr, Field, Float32, Float64, FloatType, FractionalType, GroupBy, Int16, Int32, Int64, Int8, IntegralType, LazyFrame, LazyGroupBy, List, ListExpr, ListNameSpace, MetaExpr, NameExpr, NestedType, Null, NumericType, Object, RollingGroupBy, SQLContext, Series, String, StringExpr, StringNameSpace, Struct, StructExpr, StructNameSpace, TemporalType, Time, UInt16, UInt32, UInt64, UInt8, Unknown

Constant Summary collapse

Utf8 = Allow Utf8 as an alias for String

String

Class Method Summary collapse

.align_frames(*frames, on:, select: nil, reverse: false) ⇒ Object extended from Functions
Align a sequence of frames using the uique values from one or more columns as a key.
.all(name = nil) ⇒ Expr extended from LazyFunctions
Do one of two things.
.any(name) ⇒ Expr extended from LazyFunctions
Evaluate columnwise or elementwise with a bitwise OR operation.
.arg_sort_by(exprs, reverse: false) ⇒ Expr (also: #argsort_by) extended from LazyFunctions
Find the indexes that would sort the columns.
.arg_where(condition, eager: false) ⇒ Expr, Series extended from LazyFunctions
Return indices where condition evaluates true.
.avg(column) ⇒ Expr, Float extended from LazyFunctions
Get the mean value.
.coalesce(exprs, *more_exprs) ⇒ Expr extended from LazyFunctions
Folds the expressions from left to right, keeping the first non-null value.
.col(name) ⇒ Expr extended from LazyFunctions
Return an expression representing a column in a DataFrame.
.collect_all(lazy_frames, type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ Array extended from LazyFunctions
Collect multiple LazyFrames at the same time.
.concat(items, rechunk: true, how: "vertical", parallel: true) ⇒ Object extended from Functions
Aggregate multiple Dataframes/Series to a single DataFrame/Series.
.concat_list(exprs) ⇒ Expr extended from LazyFunctions
Concat the arrays in a Series dtype List in linear time.
.concat_str(exprs, sep: "") ⇒ Expr extended from LazyFunctions
Horizontally concat Utf8 Series in linear time.
.count(column = nil) ⇒ Expr, Integer extended from LazyFunctions
Count the number of values in this column/context.
.cov(a, b) ⇒ Expr extended from LazyFunctions
Compute the covariance between two columns/ expressions.
.cumfold(acc, f, exprs, include_init: false) ⇒ Object extended from LazyFunctions
Cumulatively accumulate over multiple columns horizontally/row wise with a left fold.
.cumsum(column) ⇒ Object extended from LazyFunctions
Cumulatively sum values in a column/Series, or horizontally across list of columns/expressions.
.date_range(start, stop, interval, lazy: false, closed: "both", name: nil, time_unit: nil, time_zone: nil) ⇒ Object extended from Functions
Create a range of type Datetime (or Date).
.duration(weeks: nil, days: nil, hours: nil, minutes: nil, seconds: nil, milliseconds: nil, microseconds: nil, nanoseconds: nil, time_unit: "us") ⇒ Expr extended from LazyFunctions
Create polars Duration from distinct time components.
.element ⇒ Expr extended from LazyFunctions
Alias for an element in evaluated in an eval expression.
.exclude(columns) ⇒ Object extended from LazyFunctions
Exclude certain columns from a wildcard/regex selection.
.first(column = nil) ⇒ Object extended from LazyFunctions
Get the first value.
.fold(acc, f, exprs) ⇒ Expr extended from LazyFunctions
Accumulate over multiple columns horizontally/row wise with a left fold.
.format(fstring, *args) ⇒ Expr extended from LazyFunctions
Format expressions as a string.
.from_epoch(column, unit: "s", eager: false) ⇒ Object extended from LazyFunctions
Utility function that parses an epoch timestamp (or Unix time) to Polars Date(time).
.from_hash(data, schema: nil, columns: nil) ⇒ DataFrame extended from Convert
Construct a DataFrame from a dictionary of sequences.
.get_dummies(df, columns: nil) ⇒ DataFrame extended from Functions
Convert categorical variables into dummy/indicator variables.
.groups(column) ⇒ Object extended from LazyFunctions
Syntactic sugar for Polars.col("foo").agg_groups.
.head(column, n = 10) ⇒ Object extended from LazyFunctions
Get the first n rows.
.int_range(start, stop, step: 1, eager: false, dtype: nil) ⇒ Expr, Series (also: #arange) extended from LazyFunctions
Create a range expression (or Series).
.last(column = nil) ⇒ Object extended from LazyFunctions
Get the last value.
.lit(value, dtype: nil, allow_object: nil) ⇒ Expr extended from LazyFunctions
Return an expression representing a literal value.
.max(column) ⇒ Expr, Object extended from LazyFunctions
Get the maximum value.
.mean(column) ⇒ Expr, Float extended from LazyFunctions
Get the mean value.
.median(column) ⇒ Object extended from LazyFunctions
Get the median value.
.min(column) ⇒ Expr, Object extended from LazyFunctions
Get the minimum value.
.n_unique(column) ⇒ Object extended from LazyFunctions
Count unique values.
.ones(n, dtype: nil) ⇒ Series extended from Functions
Return a new Series of given length and type, filled with ones.
.pearson_corr(a, b, ddof: 1) ⇒ Expr extended from LazyFunctions
Compute the pearson's correlation between two columns.
.quantile(column, quantile, interpolation: "nearest") ⇒ Expr extended from LazyFunctions
Syntactic sugar for Polars.col("foo").quantile(...).
.read_avro(source, columns: nil, n_rows: nil) ⇒ DataFrame extended from IO
Read into a DataFrame from Apache Avro format.
.read_csv(source, has_header: true, columns: nil, new_columns: nil, sep: ",", comment_char: nil, quote_char: '"', skip_rows: 0, dtypes: nil, null_values: nil, ignore_errors: false, parse_dates: false, n_threads: nil, infer_schema_length: 100, batch_size: 8192, n_rows: nil, encoding: "utf8", low_memory: false, rechunk: true, storage_options: nil, skip_rows_after_header: 0, row_count_name: nil, row_count_offset: 0, sample_size: 1024, eol_char: "\n") ⇒ DataFrame extended from IO
Read a CSV file into a DataFrame.
.read_csv_batched(source, has_header: true, columns: nil, new_columns: nil, sep: ",", comment_char: nil, quote_char: '"', skip_rows: 0, dtypes: nil, null_values: nil, ignore_errors: false, parse_dates: false, n_threads: nil, infer_schema_length: 100, batch_size: 50_000, n_rows: nil, encoding: "utf8", low_memory: false, rechunk: true, skip_rows_after_header: 0, row_count_name: nil, row_count_offset: 0, sample_size: 1024, eol_char: "\n") ⇒ BatchedCsvReader extended from IO
Read a CSV file in batches.
.read_database(query) ⇒ DataFrame (also: #read_sql) extended from IO
Read a SQL query into a DataFrame.
.read_ipc(source, columns: nil, n_rows: nil, memory_map: true, storage_options: nil, row_count_name: nil, row_count_offset: 0, rechunk: true) ⇒ DataFrame extended from IO
Read into a DataFrame from Arrow IPC (Feather v2) file.
.read_ipc_schema(source) ⇒ Hash extended from IO
Get a schema of the IPC file without reading data.
.read_json(source) ⇒ DataFrame extended from IO
Read into a DataFrame from a JSON file.
.read_ndjson(source) ⇒ DataFrame extended from IO
Read into a DataFrame from a newline delimited JSON file.
.read_parquet(source, columns: nil, n_rows: nil, storage_options: nil, parallel: "auto", row_count_name: nil, row_count_offset: 0, low_memory: false, use_statistics: true, rechunk: true) ⇒ DataFrame extended from IO
Read into a DataFrame from a parquet file.
.read_parquet_schema(source) ⇒ Hash extended from IO
Get a schema of the Parquet file without reading data.
.repeat(value, n, dtype: nil, eager: false, name: nil) ⇒ Expr extended from LazyFunctions
Repeat a single value n times.
.scan_csv(source, has_header: true, sep: ",", comment_char: nil, quote_char: '"', skip_rows: 0, dtypes: nil, null_values: nil, ignore_errors: false, cache: true, with_column_names: nil, infer_schema_length: 100, n_rows: nil, encoding: "utf8", low_memory: false, rechunk: true, skip_rows_after_header: 0, row_count_name: nil, row_count_offset: 0, parse_dates: false, eol_char: "\n") ⇒ LazyFrame extended from IO
Lazily read from a CSV file or multiple files via glob patterns.
.scan_ipc(source, n_rows: nil, cache: true, rechunk: true, row_count_name: nil, row_count_offset: 0, storage_options: nil, memory_map: true) ⇒ LazyFrame extended from IO
Lazily read from an Arrow IPC (Feather v2) file or multiple files via glob patterns.
.scan_ndjson(source, infer_schema_length: 100, batch_size: 1024, n_rows: nil, low_memory: false, rechunk: true, row_count_name: nil, row_count_offset: 0) ⇒ LazyFrame extended from IO
Lazily read from a newline delimited JSON file.
.scan_parquet(source, n_rows: nil, cache: true, parallel: "auto", rechunk: true, row_count_name: nil, row_count_offset: 0, storage_options: nil, low_memory: false) ⇒ LazyFrame extended from IO
Lazily read from a parquet file or multiple files via glob patterns.
.select(exprs) ⇒ DataFrame extended from LazyFunctions
Run polars expressions without a context.
.spearman_rank_corr(a, b, ddof: 1, propagate_nans: false) ⇒ Expr extended from LazyFunctions
Compute the spearman rank correlation between two columns.
.std(column, ddof: 1) ⇒ Object extended from LazyFunctions
Get the standard deviation.
.struct(exprs, eager: false) ⇒ Object extended from LazyFunctions
Collect several columns into a Series of dtype Struct.
.sum(column) ⇒ Object extended from LazyFunctions
Sum values in a column/Series, or horizontally across list of columns/expressions.
.tail(column, n = 10) ⇒ Object extended from LazyFunctions
Get the last n rows.
.to_list(name) ⇒ Expr extended from LazyFunctions
Aggregate to list.
.var(column, ddof: 1) ⇒ Object extended from LazyFunctions
Get the variance.
.when(expr) ⇒ When extended from LazyFunctions
Start a "when, then, otherwise" expression.
.zeros(n, dtype: nil) ⇒ Series extended from Functions
Return a new Series of given length and type, filled with zeros.

Class Method Details

.align_frames(*frames, on:, select: nil, reverse: false) ⇒ `Object` Originally defined in module Functions

Align a sequence of frames using the uique values from one or more columns as a key.

Frames that do not contain the given key values have rows injected (with nulls filling the non-key columns), and each resulting frame is sorted by the key.

The original column order of input frames is not changed unless select is specified (in which case the final column order is determined from that).

Note that this does not result in a joined frame - you receive the same number of frames back that you passed in, but each is now aligned by key and has the same number of rows.

Examples:

df1 = Polars::DataFrame.new(
  {
    "dt" => [Date.new(2022, 9, 1), Date.new(2022, 9, 2), Date.new(2022, 9, 3)],
    "x" => [3.5, 4.0, 1.0],
    "y" => [10.0, 2.5, 1.5]
  }
)
df2 = Polars::DataFrame.new(
  {
    "dt" => [Date.new(2022, 9, 2), Date.new(2022, 9, 3), Date.new(2022, 9, 1)],
    "x" => [8.0, 1.0, 3.5],
    "y" => [1.5, 12.0, 5.0]
  }
)
df3 = Polars::DataFrame.new(
  {
    "dt" => [Date.new(2022, 9, 3), Date.new(2022, 9, 2)],
    "x" => [2.0, 5.0],
    "y" => [2.5, 2.0]
  }
)
af1, af2, af3 = Polars.align_frames(
  df1, df2, df3, on: "dt", select: ["x", "y"]
)
(af1 * af2 * af3).fill_null(0).select(Polars.sum(Polars.col("*")).alias("dot"))
# =>
# shape: (3, 1)
# ┌───────┐
# │ dot   │
# │ ---   │
# │ f64   │
# ╞═══════╡
# │ 0.0   │
# ├╌╌╌╌╌╌╌┤
# │ 167.5 │
# ├╌╌╌╌╌╌╌┤
# │ 47.0  │
# └───────┘

.all(name = nil) ⇒ `Expr` Originally defined in module LazyFunctions

Do one of two things.

function can do a columnwise or elementwise AND operation
a wildcard column selection

Examples:

Sum all columns

df = Polars::DataFrame.new(
  {"a" => [1, 2, 3], "b" => ["hello", "foo", "bar"], "c" => [1, 1, 1]}
)
df.select(Polars.all.sum)
# =>
# shape: (1, 3)
# ┌─────┬──────┬─────┐
# │ a   ┆ b    ┆ c   │
# │ --- ┆ ---  ┆ --- │
# │ i64 ┆ str  ┆ i64 │
# ╞═════╪══════╪═════╡
# │ 6   ┆ null ┆ 3   │
# └─────┴──────┴─────┘

.any(name) ⇒ `Expr` Originally defined in module LazyFunctions

Evaluate columnwise or elementwise with a bitwise OR operation.

.arg_sort_by(exprs, reverse: false) ⇒ `Expr` Also known as: argsort_by Originally defined in module LazyFunctions

Find the indexes that would sort the columns.

Argsort by multiple columns. The first column will be used for the ordering. If there are duplicates in the first column, the second column will be used to determine the ordering and so on.

.arg_where(condition, eager: false) ⇒ `Expr`, `Series` Originally defined in module LazyFunctions

Return indices where condition evaluates true.

Examples:

df = Polars::DataFrame.new({"a" => [1, 2, 3, 4, 5]})
df.select(
  [
    Polars.arg_where(Polars.col("a") % 2 == 0)
  ]
).to_series
# =>
# shape: (2,)
# Series: 'a' [u32]
# [
#         1
#         3
# ]

.avg(column) ⇒ `Expr`, `Float` Originally defined in module LazyFunctions

Get the mean value.

.coalesce(exprs, *more_exprs) ⇒ `Expr` Originally defined in module LazyFunctions

Folds the expressions from left to right, keeping the first non-null value.

Examples:

df = Polars::DataFrame.new(
  [
    [nil, 1.0, 1.0],
    [nil, 2.0, 2.0],
    [nil, nil, 3.0],
    [nil, nil, nil]
  ],
  columns: [["a", :f64], ["b", :f64], ["c", :f64]]
)
df.with_column(Polars.coalesce(["a", "b", "c", 99.9]).alias("d"))
# =>
# shape: (4, 4)
# ┌──────┬──────┬──────┬──────┐
# │ a    ┆ b    ┆ c    ┆ d    │
# │ ---  ┆ ---  ┆ ---  ┆ ---  │
# │ f64  ┆ f64  ┆ f64  ┆ f64  │
# ╞══════╪══════╪══════╪══════╡
# │ null ┆ 1.0  ┆ 1.0  ┆ 1.0  │
# │ null ┆ 2.0  ┆ 2.0  ┆ 2.0  │
# │ null ┆ null ┆ 3.0  ┆ 3.0  │
# │ null ┆ null ┆ null ┆ 99.9 │
# └──────┴──────┴──────┴──────┘

.col(name) ⇒ `Expr` Originally defined in module LazyFunctions

Return an expression representing a column in a DataFrame.

.collect_all(lazy_frames, type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ `Array` Originally defined in module LazyFunctions

Collect multiple LazyFrames at the same time.

This runs all the computation graphs in parallel on Polars threadpool.

.concat(items, rechunk: true, how: "vertical", parallel: true) ⇒ `Object` Originally defined in module Functions

Aggregate multiple Dataframes/Series to a single DataFrame/Series.

Examples:

df1 = Polars::DataFrame.new({"a" => [1], "b" => [3]})
df2 = Polars::DataFrame.new({"a" => [2], "b" => [4]})
Polars.concat([df1, df2])
# =>
# shape: (2, 2)
# ┌─────┬─────┐
# │ a   ┆ b   │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 1   ┆ 3   │
# │ 2   ┆ 4   │
# └─────┴─────┘

.concat_list(exprs) ⇒ `Expr` Originally defined in module LazyFunctions

Concat the arrays in a Series dtype List in linear time.

.concat_str(exprs, sep: "") ⇒ `Expr` Originally defined in module LazyFunctions

Horizontally concat Utf8 Series in linear time. Non-Utf8 columns are cast to Utf8.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [1, 2, 3],
    "b" => ["dogs", "cats", nil],
    "c" => ["play", "swim", "walk"]
  }
)
df.with_columns(
  [
    Polars.concat_str(
      [
        Polars.col("a") * 2,
        Polars.col("b"),
        Polars.col("c")
      ],
      sep: " "
    ).alias("full_sentence")
  ]
)
# =>
# shape: (3, 4)
# ┌─────┬──────┬──────┬───────────────┐
# │ a   ┆ b    ┆ c    ┆ full_sentence │
# │ --- ┆ ---  ┆ ---  ┆ ---           │
# │ i64 ┆ str  ┆ str  ┆ str           │
# ╞═════╪══════╪══════╪═══════════════╡
# │ 1   ┆ dogs ┆ play ┆ 2 dogs play   │
# │ 2   ┆ cats ┆ swim ┆ 4 cats swim   │
# │ 3   ┆ null ┆ walk ┆ null          │
# └─────┴──────┴──────┴───────────────┘

.count(column = nil) ⇒ `Expr`, `Integer` Originally defined in module LazyFunctions

Count the number of values in this column/context.

.cov(a, b) ⇒ `Expr` Originally defined in module LazyFunctions

Compute the covariance between two columns/ expressions.

.cumfold(acc, f, exprs, include_init: false) ⇒ `Object` Originally defined in module LazyFunctions

Note:

If you simply want the first encountered expression as accumulator, consider using cumreduce.

Cumulatively accumulate over multiple columns horizontally/row wise with a left fold.

Every cumulative result is added as a separate field in a Struct column.

.cumsum(column) ⇒ `Object` Originally defined in module LazyFunctions

Cumulatively sum values in a column/Series, or horizontally across list of columns/expressions.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [1, 2],
    "b" => [3, 4],
    "c" => [5, 6]
  }
)
# =>
# shape: (2, 3)
# ┌─────┬─────┬─────┐
# │ a   ┆ b   ┆ c   │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞═════╪═════╪═════╡
# │ 1   ┆ 3   ┆ 5   │
# │ 2   ┆ 4   ┆ 6   │
# └─────┴─────┴─────┘

Cumulatively sum a column by name:

df.select(Polars.cumsum("a"))
# =>
# shape: (2, 1)
# ┌─────┐
# │ a   │
# │ --- │
# │ i64 │
# ╞═════╡
# │ 1   │
# │ 3   │
# └─────┘

Cumulatively sum a list of columns/expressions horizontally:

df.with_column(Polars.cumsum(["a", "c"]))
# =>
# shape: (2, 4)
# ┌─────┬─────┬─────┬───────────┐
# │ a   ┆ b   ┆ c   ┆ cumsum    │
# │ --- ┆ --- ┆ --- ┆ ---       │
# │ i64 ┆ i64 ┆ i64 ┆ struct[2] │
# ╞═════╪═════╪═════╪═══════════╡
# │ 1   ┆ 3   ┆ 5   ┆ {1,6}     │
# │ 2   ┆ 4   ┆ 6   ┆ {2,8}     │
# └─────┴─────┴─────┴───────────┘

.date_range(start, stop, interval, lazy: false, closed: "both", name: nil, time_unit: nil, time_zone: nil) ⇒ `Object` Originally defined in module Functions

Note:

If both low and high are passed as date types (not datetime), and the interval granularity is no finer than 1d, the returned range is also of type date. All other permutations return a datetime Series.

Create a range of type Datetime (or Date).

Examples:

Using polars duration string to specify the interval

Polars.date_range(Date.new(2022, 1, 1), Date.new(2022, 3, 1), "1mo", name: "drange")
# =>
# shape: (3,)
# Series: 'drange' [date]
# [
#         2022-01-01
#         2022-02-01
#         2022-03-01
# ]

Using `timedelta` object to specify the interval:

Polars.date_range(
    DateTime.new(1985, 1, 1),
    DateTime.new(1985, 1, 10),
    "1d12h",
    time_unit: "ms"
)
# =>
# shape: (7,)
# Series: '' [datetime[ms]]
# [
#         1985-01-01 00:00:00
#         1985-01-02 12:00:00
#         1985-01-04 00:00:00
#         1985-01-05 12:00:00
#         1985-01-07 00:00:00
#         1985-01-08 12:00:00
#         1985-01-10 00:00:00
# ]

.duration(weeks: nil, days: nil, hours: nil, minutes: nil, seconds: nil, milliseconds: nil, microseconds: nil, nanoseconds: nil, time_unit: "us") ⇒ `Expr` Originally defined in module LazyFunctions

Create polars Duration from distinct time components.

Examples:

df = Polars::DataFrame.new(
  {
    "datetime" => [DateTime.new(2022, 1, 1), DateTime.new(2022, 1, 2)],
    "add" => [1, 2]
  }
)
df.select(
  [
    (Polars.col("datetime") + Polars.duration(weeks: "add")).alias("add_weeks"),
    (Polars.col("datetime") + Polars.duration(days: "add")).alias("add_days"),
    (Polars.col("datetime") + Polars.duration(seconds: "add")).alias("add_seconds"),
    (Polars.col("datetime") + Polars.duration(milliseconds: "add")).alias(
      "add_milliseconds"
    ),
    (Polars.col("datetime") + Polars.duration(hours: "add")).alias("add_hours")
  ]
)
# =>
# shape: (2, 5)
# ┌─────────────────────┬─────────────────────┬─────────────────────┬─────────────────────────┬─────────────────────┐
# │ add_weeks           ┆ add_days            ┆ add_seconds         ┆ add_milliseconds        ┆ add_hours           │
# │ ---                 ┆ ---                 ┆ ---                 ┆ ---                     ┆ ---                 │
# │ datetime[ns]        ┆ datetime[ns]        ┆ datetime[ns]        ┆ datetime[ns]            ┆ datetime[ns]        │
# ╞═════════════════════╪═════════════════════╪═════════════════════╪═════════════════════════╪═════════════════════╡
# │ 2022-01-08 00:00:00 ┆ 2022-01-02 00:00:00 ┆ 2022-01-01 00:00:01 ┆ 2022-01-01 00:00:00.001 ┆ 2022-01-01 01:00:00 │
# │ 2022-01-16 00:00:00 ┆ 2022-01-04 00:00:00 ┆ 2022-01-02 00:00:02 ┆ 2022-01-02 00:00:00.002 ┆ 2022-01-02 02:00:00 │
# └─────────────────────┴─────────────────────┴─────────────────────┴─────────────────────────┴─────────────────────┘

.element ⇒ `Expr` Originally defined in module LazyFunctions

Alias for an element in evaluated in an eval expression.

Examples:

A horizontal rank computation by taking the elements of a list

df = Polars::DataFrame.new({"a" => [1, 8, 3], "b" => [4, 5, 2]})
df.with_column(
  Polars.concat_list(["a", "b"]).list.eval(Polars.element.rank).alias("rank")
)
# =>
# shape: (3, 3)
# ┌─────┬─────┬────────────┐
# │ a   ┆ b   ┆ rank       │
# │ --- ┆ --- ┆ ---        │
# │ i64 ┆ i64 ┆ list[f64]  │
# ╞═════╪═════╪════════════╡
# │ 1   ┆ 4   ┆ [1.0, 2.0] │
# │ 8   ┆ 5   ┆ [2.0, 1.0] │
# │ 3   ┆ 2   ┆ [2.0, 1.0] │
# └─────┴─────┴────────────┘

.exclude(columns) ⇒ `Object` Originally defined in module LazyFunctions

Exclude certain columns from a wildcard/regex selection.

Examples:

df = Polars::DataFrame.new(
  {
    "aa" => [1, 2, 3],
    "ba" => ["a", "b", nil],
    "cc" => [nil, 2.5, 1.5]
  }
)
# =>
# shape: (3, 3)
# ┌─────┬──────┬──────┐
# │ aa  ┆ ba   ┆ cc   │
# │ --- ┆ ---  ┆ ---  │
# │ i64 ┆ str  ┆ f64  │
# ╞═════╪══════╪══════╡
# │ 1   ┆ a    ┆ null │
# │ 2   ┆ b    ┆ 2.5  │
# │ 3   ┆ null ┆ 1.5  │
# └─────┴──────┴──────┘

Exclude by column name(s):

df.select(Polars.exclude("ba"))
# =>
# shape: (3, 2)
# ┌─────┬──────┐
# │ aa  ┆ cc   │
# │ --- ┆ ---  │
# │ i64 ┆ f64  │
# ╞═════╪══════╡
# │ 1   ┆ null │
# │ 2   ┆ 2.5  │
# │ 3   ┆ 1.5  │
# └─────┴──────┘

Exclude by regex, e.g. removing all columns whose names end with the letter "a":

df.select(Polars.exclude("^.*a$"))
# =>
# shape: (3, 1)
# ┌──────┐
# │ cc   │
# │ ---  │
# │ f64  │
# ╞══════╡
# │ null │
# │ 2.5  │
# │ 1.5  │
# └──────┘

.first(column = nil) ⇒ `Object` Originally defined in module LazyFunctions

Get the first value.

.fold(acc, f, exprs) ⇒ `Expr` Originally defined in module LazyFunctions

Accumulate over multiple columns horizontally/row wise with a left fold.

.format(fstring, *args) ⇒ `Expr` Originally defined in module LazyFunctions

Format expressions as a string.

Examples:

df = Polars::DataFrame.new(
  {
    "a": ["a", "b", "c"],
    "b": [1, 2, 3]
  }
)
df.select(
  [
    Polars.format("foo_{}_bar_{}", Polars.col("a"), "b").alias("fmt")
  ]
)
# =>
# shape: (3, 1)
# ┌─────────────┐
# │ fmt         │
# │ ---         │
# │ str         │
# ╞═════════════╡
# │ foo_a_bar_1 │
# │ foo_b_bar_2 │
# │ foo_c_bar_3 │
# └─────────────┘

.from_epoch(column, unit: "s", eager: false) ⇒ `Object` Originally defined in module LazyFunctions

Utility function that parses an epoch timestamp (or Unix time) to Polars Date(time).

Depending on the unit provided, this function will return a different dtype:

unit: "d" returns pl.Date
unit: "s" returns pl.Datetime"us"
unit: "ms" returns pl.Datetime["ms"]
unit: "us" returns pl.Datetime["us"]
unit: "ns" returns pl.Datetime["ns"]

Examples:

df = Polars::DataFrame.new({"timestamp" => [1666683077, 1666683099]}).lazy
df.select(Polars.from_epoch(Polars.col("timestamp"), unit: "s")).collect
# =>
# shape: (2, 1)
# ┌─────────────────────┐
# │ timestamp           │
# │ ---                 │
# │ datetime[μs]        │
# ╞═════════════════════╡
# │ 2022-10-25 07:31:17 │
# │ 2022-10-25 07:31:39 │
# └─────────────────────┘

.from_hash(data, schema: nil, columns: nil) ⇒ `DataFrame` Originally defined in module Convert

Construct a DataFrame from a dictionary of sequences.

This operation clones data, unless you pass in a Hash<String, Series>.

Examples:

data = {"a" => [1, 2], "b" => [3, 4]}
Polars.from_hash(data)
# =>
# shape: (2, 2)
# ┌─────┬─────┐
# │ a   ┆ b   │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 1   ┆ 3   │
# │ 2   ┆ 4   │
# └─────┴─────┘

.get_dummies(df, columns: nil) ⇒ `DataFrame` Originally defined in module Functions

Convert categorical variables into dummy/indicator variables.

.groups(column) ⇒ `Object` Originally defined in module LazyFunctions

Syntactic sugar for Polars.col("foo").agg_groups.

.head(column, n = 10) ⇒ `Object` Originally defined in module LazyFunctions

Get the first n rows.

.int_range(start, stop, step: 1, eager: false, dtype: nil) ⇒ `Expr`, `Series` Also known as: arange Originally defined in module LazyFunctions

Create a range expression (or Series).

This can be used in a select, with_column, etc. Be sure that the resulting range size is equal to the length of the DataFrame you are collecting.

Examples:

Polars.arange(0, 3, eager: true)
# =>
# shape: (3,)
# Series: 'arange' [i64]
# [
#         0
#         1
#         2
# ]

.last(column = nil) ⇒ `Object` Originally defined in module LazyFunctions

Get the last value.

Depending on the input type this function does different things:

nil -> expression to take last column of a context.
String -> syntactic sugar for Polars.col(..).last
Series -> Take last value in Series

.lit(value, dtype: nil, allow_object: nil) ⇒ `Expr` Originally defined in module LazyFunctions

Return an expression representing a literal value.

.max(column) ⇒ `Expr`, `Object` Originally defined in module LazyFunctions

Get the maximum value.

.mean(column) ⇒ `Expr`, `Float` Originally defined in module LazyFunctions

Get the mean value.

.median(column) ⇒ `Object` Originally defined in module LazyFunctions

Get the median value.

.min(column) ⇒ `Expr`, `Object` Originally defined in module LazyFunctions

Get the minimum value.

.n_unique(column) ⇒ `Object` Originally defined in module LazyFunctions

Count unique values.

.ones(n, dtype: nil) ⇒ `Series` Originally defined in module Functions

Note:

In the lazy API you should probably not use this, but use lit(1) instead.

Return a new Series of given length and type, filled with ones.

.pearson_corr(a, b, ddof: 1) ⇒ `Expr` Originally defined in module LazyFunctions

Compute the pearson's correlation between two columns.

.quantile(column, quantile, interpolation: "nearest") ⇒ `Expr` Originally defined in module LazyFunctions

Syntactic sugar for Polars.col("foo").quantile(...).

.read_avro(source, columns: nil, n_rows: nil) ⇒ `DataFrame` Originally defined in module IO

Read into a DataFrame from Apache Avro format.

.read_csv(source, has_header: true, columns: nil, new_columns: nil, sep: ",", comment_char: nil, quote_char: '"', skip_rows: 0, dtypes: nil, null_values: nil, ignore_errors: false, parse_dates: false, n_threads: nil, infer_schema_length: 100, batch_size: 8192, n_rows: nil, encoding: "utf8", low_memory: false, rechunk: true, storage_options: nil, skip_rows_after_header: 0, row_count_name: nil, row_count_offset: 0, sample_size: 1024, eol_char: "\n") ⇒ `DataFrame` Originally defined in module IO

Note:

This operation defaults to a rechunk operation at the end, meaning that all data will be stored continuously in memory. Set rechunk: false if you are benchmarking the csv-reader. A rechunk is an expensive operation.

Read a CSV file into a DataFrame.

.read_csv_batched(source, has_header: true, columns: nil, new_columns: nil, sep: ",", comment_char: nil, quote_char: '"', skip_rows: 0, dtypes: nil, null_values: nil, ignore_errors: false, parse_dates: false, n_threads: nil, infer_schema_length: 100, batch_size: 50_000, n_rows: nil, encoding: "utf8", low_memory: false, rechunk: true, skip_rows_after_header: 0, row_count_name: nil, row_count_offset: 0, sample_size: 1024, eol_char: "\n") ⇒ `BatchedCsvReader` Originally defined in module IO

Read a CSV file in batches.

Upon creation of the BatchedCsvReader, polars will gather statistics and determine the file chunks. After that work will only be done if next_batches is called.

Examples:

reader = Polars.read_csv_batched(
  "./tpch/tables_scale_100/lineitem.tbl", sep: "|", parse_dates: true
)
reader.next_batches(5)

.read_database(query) ⇒ `DataFrame` Also known as: read_sql Originally defined in module IO

Read a SQL query into a DataFrame.

.read_ipc(source, columns: nil, n_rows: nil, memory_map: true, storage_options: nil, row_count_name: nil, row_count_offset: 0, rechunk: true) ⇒ `DataFrame` Originally defined in module IO

Read into a DataFrame from Arrow IPC (Feather v2) file.

.read_ipc_schema(source) ⇒ `Hash` Originally defined in module IO

Get a schema of the IPC file without reading data.

.read_json(source) ⇒ `DataFrame` Originally defined in module IO

Read into a DataFrame from a JSON file.

.read_ndjson(source) ⇒ `DataFrame` Originally defined in module IO

Read into a DataFrame from a newline delimited JSON file.

.read_parquet(source, columns: nil, n_rows: nil, storage_options: nil, parallel: "auto", row_count_name: nil, row_count_offset: 0, low_memory: false, use_statistics: true, rechunk: true) ⇒ `DataFrame` Originally defined in module IO

Note:

This operation defaults to a rechunk operation at the end, meaning that all data will be stored continuously in memory. Set rechunk: false if you are benchmarking the parquet-reader. A rechunk is an expensive operation.

Read into a DataFrame from a parquet file.

.read_parquet_schema(source) ⇒ `Hash` Originally defined in module IO

Get a schema of the Parquet file without reading data.

.repeat(value, n, dtype: nil, eager: false, name: nil) ⇒ `Expr` Originally defined in module LazyFunctions

Repeat a single value n times.

.scan_csv(source, has_header: true, sep: ",", comment_char: nil, quote_char: '"', skip_rows: 0, dtypes: nil, null_values: nil, ignore_errors: false, cache: true, with_column_names: nil, infer_schema_length: 100, n_rows: nil, encoding: "utf8", low_memory: false, rechunk: true, skip_rows_after_header: 0, row_count_name: nil, row_count_offset: 0, parse_dates: false, eol_char: "\n") ⇒ `LazyFrame` Originally defined in module IO

Lazily read from a CSV file or multiple files via glob patterns.

This allows the query optimizer to push down predicates and projections to the scan level, thereby potentially reducing memory overhead.

.scan_ipc(source, n_rows: nil, cache: true, rechunk: true, row_count_name: nil, row_count_offset: 0, storage_options: nil, memory_map: true) ⇒ `LazyFrame` Originally defined in module IO

Lazily read from an Arrow IPC (Feather v2) file or multiple files via glob patterns.

This allows the query optimizer to push down predicates and projections to the scan level, thereby potentially reducing memory overhead.

.scan_ndjson(source, infer_schema_length: 100, batch_size: 1024, n_rows: nil, low_memory: false, rechunk: true, row_count_name: nil, row_count_offset: 0) ⇒ `LazyFrame` Originally defined in module IO

Lazily read from a newline delimited JSON file.

This allows the query optimizer to push down predicates and projections to the scan level, thereby potentially reducing memory overhead.

.scan_parquet(source, n_rows: nil, cache: true, parallel: "auto", rechunk: true, row_count_name: nil, row_count_offset: 0, storage_options: nil, low_memory: false) ⇒ `LazyFrame` Originally defined in module IO

Lazily read from a parquet file or multiple files via glob patterns.

This allows the query optimizer to push down predicates and projections to the scan level, thereby potentially reducing memory overhead.

.select(exprs) ⇒ `DataFrame` Originally defined in module LazyFunctions

Run polars expressions without a context.

.spearman_rank_corr(a, b, ddof: 1, propagate_nans: false) ⇒ `Expr` Originally defined in module LazyFunctions

Compute the spearman rank correlation between two columns.

Missing data will be excluded from the computation.

.std(column, ddof: 1) ⇒ `Object` Originally defined in module LazyFunctions

Get the standard deviation.

.struct(exprs, eager: false) ⇒ `Object` Originally defined in module LazyFunctions

Collect several columns into a Series of dtype Struct.

Examples:

Polars::DataFrame.new(
  {
    "int" => [1, 2],
    "str" => ["a", "b"],
    "bool" => [true, nil],
    "list" => [[1, 2], [3]],
  }
).select([Polars.struct(Polars.all).alias("my_struct")])
# =>
# shape: (2, 1)
# ┌─────────────────────┐
# │ my_struct           │
# │ ---                 │
# │ struct[4]           │
# ╞═════════════════════╡
# │ {1,"a",true,[1, 2]} │
# │ {2,"b",null,[3]}    │
# └─────────────────────┘

Only collect specific columns as a struct:

df = Polars::DataFrame.new(
  {"a" => [1, 2, 3, 4], "b" => ["one", "two", "three", "four"], "c" => [9, 8, 7, 6]}
)
df.with_column(Polars.struct(Polars.col(["a", "b"])).alias("a_and_b"))
# =>
# shape: (4, 4)
# ┌─────┬───────┬─────┬─────────────┐
# │ a   ┆ b     ┆ c   ┆ a_and_b     │
# │ --- ┆ ---   ┆ --- ┆ ---         │
# │ i64 ┆ str   ┆ i64 ┆ struct[2]   │
# ╞═════╪═══════╪═════╪═════════════╡
# │ 1   ┆ one   ┆ 9   ┆ {1,"one"}   │
# │ 2   ┆ two   ┆ 8   ┆ {2,"two"}   │
# │ 3   ┆ three ┆ 7   ┆ {3,"three"} │
# │ 4   ┆ four  ┆ 6   ┆ {4,"four"}  │
# └─────┴───────┴─────┴─────────────┘

.sum(column) ⇒ `Object` Originally defined in module LazyFunctions

Sum values in a column/Series, or horizontally across list of columns/expressions.

.tail(column, n = 10) ⇒ `Object` Originally defined in module LazyFunctions

Get the last n rows.

.to_list(name) ⇒ `Expr` Originally defined in module LazyFunctions

Aggregate to list.

.var(column, ddof: 1) ⇒ `Object` Originally defined in module LazyFunctions

Get the variance.

.when(expr) ⇒ `When` Originally defined in module LazyFunctions

Start a "when, then, otherwise" expression.

Examples:

df = Polars::DataFrame.new({"foo" => [1, 3, 4], "bar" => [3, 4, 0]})
df.with_column(Polars.when(Polars.col("foo") > 2).then(Polars.lit(1)).otherwise(Polars.lit(-1)))
# =>
# shape: (3, 3)
# ┌─────┬─────┬─────────┐
# │ foo ┆ bar ┆ literal │
# │ --- ┆ --- ┆ ---     │
# │ i64 ┆ i64 ┆ i32     │
# ╞═════╪═════╪═════════╡
# │ 1   ┆ 3   ┆ -1      │
# │ 3   ┆ 4   ┆ 1       │
# │ 4   ┆ 0   ┆ 1       │
# └─────┴─────┴─────────┘

.zeros(n, dtype: nil) ⇒ `Series` Originally defined in module Functions

Note:

In the lazy API you should probably not use this, but use lit(0) instead.

Return a new Series of given length and type, filled with zeros.

Module: Polars

Defined Under Namespace

Constant Summary collapse

Class Method Summary collapse

Class Method Details

.align_frames(*frames, on:, select: nil, reverse: false) ⇒ Object Originally defined in module Functions

Examples:

.all(name = nil) ⇒ Expr Originally defined in module LazyFunctions

Examples:

Sum all columns

.any(name) ⇒ Expr Originally defined in module LazyFunctions

.arg_sort_by(exprs, reverse: false) ⇒ Expr Also known as: argsort_by Originally defined in module LazyFunctions

.arg_where(condition, eager: false) ⇒ Expr, Series Originally defined in module LazyFunctions

Examples:

.avg(column) ⇒ Expr, Float Originally defined in module LazyFunctions

.coalesce(exprs, *more_exprs) ⇒ Expr Originally defined in module LazyFunctions

Examples:

.col(name) ⇒ Expr Originally defined in module LazyFunctions

.concat(items, rechunk: true, how: "vertical", parallel: true) ⇒ Object Originally defined in module Functions

Examples:

.concat_list(exprs) ⇒ Expr Originally defined in module LazyFunctions

.concat_str(exprs, sep: "") ⇒ Expr Originally defined in module LazyFunctions

Examples:

.count(column = nil) ⇒ Expr, Integer Originally defined in module LazyFunctions

.cov(a, b) ⇒ Expr Originally defined in module LazyFunctions

.cumfold(acc, f, exprs, include_init: false) ⇒ Object Originally defined in module LazyFunctions

.cumsum(column) ⇒ Object Originally defined in module LazyFunctions

Examples:

Cumulatively sum a column by name:

Cumulatively sum a list of columns/expressions horizontally:

.date_range(start, stop, interval, lazy: false, closed: "both", name: nil, time_unit: nil, time_zone: nil) ⇒ Object Originally defined in module Functions

Examples:

Using polars duration string to specify the interval

Using timedelta object to specify the interval:

.duration(weeks: nil, days: nil, hours: nil, minutes: nil, seconds: nil, milliseconds: nil, microseconds: nil, nanoseconds: nil, time_unit: "us") ⇒ Expr Originally defined in module LazyFunctions

Examples:

.element ⇒ Expr Originally defined in module LazyFunctions

Examples:

A horizontal rank computation by taking the elements of a list

.exclude(columns) ⇒ Object Originally defined in module LazyFunctions

Examples:

Exclude by column name(s):

Exclude by regex, e.g. removing all columns whose names end with the letter "a":

.first(column = nil) ⇒ Object Originally defined in module LazyFunctions

.fold(acc, f, exprs) ⇒ Expr Originally defined in module LazyFunctions

.format(fstring, *args) ⇒ Expr Originally defined in module LazyFunctions

Examples:

.from_epoch(column, unit: "s", eager: false) ⇒ Object Originally defined in module LazyFunctions

Examples:

.from_hash(data, schema: nil, columns: nil) ⇒ DataFrame Originally defined in module Convert

Examples:

.get_dummies(df, columns: nil) ⇒ DataFrame Originally defined in module Functions

.groups(column) ⇒ Object Originally defined in module LazyFunctions

.head(column, n = 10) ⇒ Object Originally defined in module LazyFunctions

.int_range(start, stop, step: 1, eager: false, dtype: nil) ⇒ Expr, Series Also known as: arange Originally defined in module LazyFunctions

Examples:

.last(column = nil) ⇒ Object Originally defined in module LazyFunctions

.lit(value, dtype: nil, allow_object: nil) ⇒ Expr Originally defined in module LazyFunctions

.max(column) ⇒ Expr, Object Originally defined in module LazyFunctions

.mean(column) ⇒ Expr, Float Originally defined in module LazyFunctions

.median(column) ⇒ Object Originally defined in module LazyFunctions

.min(column) ⇒ Expr, Object Originally defined in module LazyFunctions

.n_unique(column) ⇒ Object Originally defined in module LazyFunctions

.ones(n, dtype: nil) ⇒ Series Originally defined in module Functions

.pearson_corr(a, b, ddof: 1) ⇒ Expr Originally defined in module LazyFunctions

.quantile(column, quantile, interpolation: "nearest") ⇒ Expr Originally defined in module LazyFunctions

.read_avro(source, columns: nil, n_rows: nil) ⇒ DataFrame Originally defined in module IO

Examples:

.read_database(query) ⇒ DataFrame Also known as: read_sql Originally defined in module IO

.read_ipc(source, columns: nil, n_rows: nil, memory_map: true, storage_options: nil, row_count_name: nil, row_count_offset: 0, rechunk: true) ⇒ DataFrame Originally defined in module IO

.read_ipc_schema(source) ⇒ Hash Originally defined in module IO

.read_json(source) ⇒ DataFrame Originally defined in module IO

.read_ndjson(source) ⇒ DataFrame Originally defined in module IO

.read_parquet(source, columns: nil, n_rows: nil, storage_options: nil, parallel: "auto", row_count_name: nil, row_count_offset: 0, low_memory: false, use_statistics: true, rechunk: true) ⇒ DataFrame Originally defined in module IO

.read_parquet_schema(source) ⇒ Hash Originally defined in module IO

.repeat(value, n, dtype: nil, eager: false, name: nil) ⇒ Expr Originally defined in module LazyFunctions

.scan_ipc(source, n_rows: nil, cache: true, rechunk: true, row_count_name: nil, row_count_offset: 0, storage_options: nil, memory_map: true) ⇒ LazyFrame Originally defined in module IO

.scan_ndjson(source, infer_schema_length: 100, batch_size: 1024, n_rows: nil, low_memory: false, rechunk: true, row_count_name: nil, row_count_offset: 0) ⇒ LazyFrame Originally defined in module IO

.scan_parquet(source, n_rows: nil, cache: true, parallel: "auto", rechunk: true, row_count_name: nil, row_count_offset: 0, storage_options: nil, low_memory: false) ⇒ LazyFrame Originally defined in module IO

.select(exprs) ⇒ DataFrame Originally defined in module LazyFunctions

.align_frames(*frames, on:, select: nil, reverse: false) ⇒ `Object` Originally defined in module Functions

.all(name = nil) ⇒ `Expr` Originally defined in module LazyFunctions

.any(name) ⇒ `Expr` Originally defined in module LazyFunctions

.arg_sort_by(exprs, reverse: false) ⇒ `Expr` Also known as: argsort_by Originally defined in module LazyFunctions

.arg_where(condition, eager: false) ⇒ `Expr`, `Series` Originally defined in module LazyFunctions

.avg(column) ⇒ `Expr`, `Float` Originally defined in module LazyFunctions

.coalesce(exprs, *more_exprs) ⇒ `Expr` Originally defined in module LazyFunctions

.col(name) ⇒ `Expr` Originally defined in module LazyFunctions

.concat(items, rechunk: true, how: "vertical", parallel: true) ⇒ `Object` Originally defined in module Functions

.concat_list(exprs) ⇒ `Expr` Originally defined in module LazyFunctions

.concat_str(exprs, sep: "") ⇒ `Expr` Originally defined in module LazyFunctions

.count(column = nil) ⇒ `Expr`, `Integer` Originally defined in module LazyFunctions

.cov(a, b) ⇒ `Expr` Originally defined in module LazyFunctions

.cumfold(acc, f, exprs, include_init: false) ⇒ `Object` Originally defined in module LazyFunctions

.cumsum(column) ⇒ `Object` Originally defined in module LazyFunctions

.date_range(start, stop, interval, lazy: false, closed: "both", name: nil, time_unit: nil, time_zone: nil) ⇒ `Object` Originally defined in module Functions

Using `timedelta` object to specify the interval:

.duration(weeks: nil, days: nil, hours: nil, minutes: nil, seconds: nil, milliseconds: nil, microseconds: nil, nanoseconds: nil, time_unit: "us") ⇒ `Expr` Originally defined in module LazyFunctions

.element ⇒ `Expr` Originally defined in module LazyFunctions

.exclude(columns) ⇒ `Object` Originally defined in module LazyFunctions

.first(column = nil) ⇒ `Object` Originally defined in module LazyFunctions

.fold(acc, f, exprs) ⇒ `Expr` Originally defined in module LazyFunctions

.format(fstring, *args) ⇒ `Expr` Originally defined in module LazyFunctions

.from_epoch(column, unit: "s", eager: false) ⇒ `Object` Originally defined in module LazyFunctions

.from_hash(data, schema: nil, columns: nil) ⇒ `DataFrame` Originally defined in module Convert

.get_dummies(df, columns: nil) ⇒ `DataFrame` Originally defined in module Functions

.groups(column) ⇒ `Object` Originally defined in module LazyFunctions

.head(column, n = 10) ⇒ `Object` Originally defined in module LazyFunctions

.int_range(start, stop, step: 1, eager: false, dtype: nil) ⇒ `Expr`, `Series` Also known as: arange Originally defined in module LazyFunctions

.last(column = nil) ⇒ `Object` Originally defined in module LazyFunctions

.lit(value, dtype: nil, allow_object: nil) ⇒ `Expr` Originally defined in module LazyFunctions

.max(column) ⇒ `Expr`, `Object` Originally defined in module LazyFunctions

.mean(column) ⇒ `Expr`, `Float` Originally defined in module LazyFunctions

.median(column) ⇒ `Object` Originally defined in module LazyFunctions

.min(column) ⇒ `Expr`, `Object` Originally defined in module LazyFunctions

.n_unique(column) ⇒ `Object` Originally defined in module LazyFunctions

.ones(n, dtype: nil) ⇒ `Series` Originally defined in module Functions

.pearson_corr(a, b, ddof: 1) ⇒ `Expr` Originally defined in module LazyFunctions

.quantile(column, quantile, interpolation: "nearest") ⇒ `Expr` Originally defined in module LazyFunctions

.read_avro(source, columns: nil, n_rows: nil) ⇒ `DataFrame` Originally defined in module IO

.read_database(query) ⇒ `DataFrame` Also known as: read_sql Originally defined in module IO

.read_ipc(source, columns: nil, n_rows: nil, memory_map: true, storage_options: nil, row_count_name: nil, row_count_offset: 0, rechunk: true) ⇒ `DataFrame` Originally defined in module IO

.read_ipc_schema(source) ⇒ `Hash` Originally defined in module IO

.read_json(source) ⇒ `DataFrame` Originally defined in module IO

.read_ndjson(source) ⇒ `DataFrame` Originally defined in module IO

.read_parquet(source, columns: nil, n_rows: nil, storage_options: nil, parallel: "auto", row_count_name: nil, row_count_offset: 0, low_memory: false, use_statistics: true, rechunk: true) ⇒ `DataFrame` Originally defined in module IO

.read_parquet_schema(source) ⇒ `Hash` Originally defined in module IO

.repeat(value, n, dtype: nil, eager: false, name: nil) ⇒ `Expr` Originally defined in module LazyFunctions

.scan_ipc(source, n_rows: nil, cache: true, rechunk: true, row_count_name: nil, row_count_offset: 0, storage_options: nil, memory_map: true) ⇒ `LazyFrame` Originally defined in module IO

.scan_ndjson(source, infer_schema_length: 100, batch_size: 1024, n_rows: nil, low_memory: false, rechunk: true, row_count_name: nil, row_count_offset: 0) ⇒ `LazyFrame` Originally defined in module IO

.scan_parquet(source, n_rows: nil, cache: true, parallel: "auto", rechunk: true, row_count_name: nil, row_count_offset: 0, storage_options: nil, low_memory: false) ⇒ `LazyFrame` Originally defined in module IO

.select(exprs) ⇒ `DataFrame` Originally defined in module LazyFunctions

.spearman_rank_corr(a, b, ddof: 1, propagate_nans: false) ⇒ `Expr` Originally defined in module LazyFunctions

.std(column, ddof: 1) ⇒ `Object` Originally defined in module LazyFunctions

.struct(exprs, eager: false) ⇒ `Object` Originally defined in module LazyFunctions

.sum(column) ⇒ `Object` Originally defined in module LazyFunctions

.tail(column, n = 10) ⇒ `Object` Originally defined in module LazyFunctions

.to_list(name) ⇒ `Expr` Originally defined in module LazyFunctions

.var(column, ddof: 1) ⇒ `Object` Originally defined in module LazyFunctions

.when(expr) ⇒ `When` Originally defined in module LazyFunctions

.zeros(n, dtype: nil) ⇒ `Series` Originally defined in module Functions