Class: Polars::LazyFrame
- Inherits:
-
Object
- Object
- Polars::LazyFrame
- Defined in:
- lib/polars/lazy_frame.rb
Overview
Representation of a Lazy computation graph/query against a DataFrame.
Class Method Summary collapse
-
.read_json(file) ⇒ LazyFrame
Read a logical plan from a JSON file to construct a LazyFrame.
Instance Method Summary collapse
-
#cache ⇒ LazyFrame
Cache the result once the execution of the physical plan hits this node.
-
#cleared ⇒ LazyFrame
Create an empty copy of the current LazyFrame.
-
#collect(type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false, _eager: false) ⇒ DataFrame
Collect into a DataFrame.
-
#columns ⇒ Array
Get or set column names.
-
#describe_optimized_plan(type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ String
Create a string representation of the optimized query plan.
-
#describe_plan ⇒ String
Create a string representation of the unoptimized query plan.
-
#drop(columns) ⇒ LazyFrame
Remove one or multiple columns from a DataFrame.
-
#drop_nulls(subset: nil) ⇒ LazyFrame
Drop rows with null values from this LazyFrame.
-
#dtypes ⇒ Array
Get dtypes of columns in LazyFrame.
-
#explode(columns) ⇒ LazyFrame
Explode lists to long format.
-
#fetch(n_rows = 500, type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ DataFrame
Collect a small number of rows for debugging purposes.
-
#fill_nan(fill_value) ⇒ LazyFrame
Fill floating point NaN values.
-
#fill_null(value = nil, strategy: nil, limit: nil, matches_supertype: nil) ⇒ LazyFrame
Fill null values using the specified value or strategy.
-
#filter(predicate) ⇒ LazyFrame
Filter the rows in the DataFrame based on a predicate expression.
-
#first ⇒ LazyFrame
Get the first row of the DataFrame.
-
#group_by(by, maintain_order: false) ⇒ LazyGroupBy
(also: #groupby, #group)
Start a group by operation.
-
#group_by_dynamic(index_column, every:, period: nil, offset: nil, truncate: nil, include_boundaries: false, closed: "left", label: "left", by: nil, start_by: "window", check_sorted: true) ⇒ DataFrame
(also: #groupby_dynamic)
Group based on a time value (or index value of type
:i32,:i64). -
#group_by_rolling(index_column:, period:, offset: nil, closed: "right", by: nil, check_sorted: true) ⇒ LazyFrame
(also: #groupby_rolling)
Create rolling groups based on a time column.
-
#head(n = 5) ⇒ LazyFrame
Get the first
nrows. -
#include?(key) ⇒ Boolean
Check if LazyFrame includes key.
-
#initialize(data = nil, schema: nil, schema_overrides: nil, orient: nil, infer_schema_length: 100, nan_to_null: false) ⇒ LazyFrame
constructor
Create a new LazyFrame.
-
#interpolate ⇒ LazyFrame
Interpolate intermediate values.
-
#join(other, left_on: nil, right_on: nil, on: nil, how: "inner", suffix: "_right", allow_parallel: true, force_parallel: false) ⇒ LazyFrame
Add a join operation to the Logical Plan.
-
#join_asof(other, left_on: nil, right_on: nil, on: nil, by_left: nil, by_right: nil, by: nil, strategy: "backward", suffix: "_right", tolerance: nil, allow_parallel: true, force_parallel: false) ⇒ LazyFrame
Perform an asof join.
-
#last ⇒ LazyFrame
Get the last row of the DataFrame.
-
#lazy ⇒ LazyFrame
Return lazy representation, i.e.
-
#limit(n = 5) ⇒ LazyFrame
Get the first
nrows. -
#max ⇒ LazyFrame
Aggregate the columns in the DataFrame to their maximum value.
-
#mean ⇒ LazyFrame
Aggregate the columns in the DataFrame to their mean value.
-
#median ⇒ LazyFrame
Aggregate the columns in the DataFrame to their median value.
-
#melt(id_vars: nil, value_vars: nil, variable_name: nil, value_name: nil, streamable: true) ⇒ LazyFrame
Unpivot a DataFrame from wide to long format.
-
#min ⇒ LazyFrame
Aggregate the columns in the DataFrame to their minimum value.
-
#pipe(func, *args, **kwargs, &block) ⇒ LazyFrame
Offers a structured way to apply a sequence of user-defined functions (UDFs).
-
#quantile(quantile, interpolation: "nearest") ⇒ LazyFrame
Aggregate the columns in the DataFrame to their quantile value.
-
#rename(mapping) ⇒ LazyFrame
Rename column names.
-
#reverse ⇒ LazyFrame
Reverse the DataFrame.
-
#schema ⇒ Hash
Get the schema.
-
#select(exprs) ⇒ LazyFrame
Select columns from this DataFrame.
-
#set_sorted(column, *more_columns, descending: false) ⇒ LazyFrame
Indicate that one or multiple columns are sorted.
-
#shift(n, fill_value: nil) ⇒ LazyFrame
Shift the values by a given period.
-
#shift_and_fill(periods, fill_value) ⇒ LazyFrame
Shift the values by a given period and fill the resulting null values.
-
#sink_parquet(path, compression: "zstd", compression_level: nil, statistics: false, row_group_size: nil, data_pagesize_limit: nil, maintain_order: true, type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, no_optimization: false, slice_pushdown: true) ⇒ DataFrame
Persists a LazyFrame at the provided path.
-
#slice(offset, length = nil) ⇒ LazyFrame
Get a slice of this DataFrame.
-
#sort(by, reverse: false, nulls_last: false, maintain_order: false) ⇒ LazyFrame
Sort the DataFrame.
-
#std(ddof: 1) ⇒ LazyFrame
Aggregate the columns in the DataFrame to their standard deviation value.
-
#sum ⇒ LazyFrame
Aggregate the columns in the DataFrame to their sum value.
-
#tail(n = 5) ⇒ LazyFrame
Get the last
nrows. -
#take_every(n) ⇒ LazyFrame
Take every nth row in the LazyFrame and return as a new LazyFrame.
-
#to_s ⇒ String
Returns a string representing the LazyFrame.
-
#unique(maintain_order: true, subset: nil, keep: "first") ⇒ LazyFrame
Drop duplicate rows from this DataFrame.
-
#unnest(names) ⇒ LazyFrame
Decompose a struct into its fields.
-
#var(ddof: 1) ⇒ LazyFrame
Aggregate the columns in the DataFrame to their variance value.
-
#width ⇒ Integer
Get the width of the LazyFrame.
-
#with_column(column) ⇒ LazyFrame
Add or overwrite column in a DataFrame.
-
#with_columns(exprs) ⇒ LazyFrame
Add or overwrite multiple columns in a DataFrame.
-
#with_context(other) ⇒ LazyFrame
Add an external context to the computation graph.
-
#with_row_count(name: "row_nr", offset: 0) ⇒ LazyFrame
Add a column at index 0 that counts the rows.
-
#write_json(file) ⇒ nil
Write the logical plan of this LazyFrame to a file or string in JSON format.
Constructor Details
#initialize(data = nil, schema: nil, schema_overrides: nil, orient: nil, infer_schema_length: 100, nan_to_null: false) ⇒ LazyFrame
Create a new LazyFrame.
8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
# File 'lib/polars/lazy_frame.rb', line 8 def initialize(data = nil, schema: nil, schema_overrides: nil, orient: nil, infer_schema_length: 100, nan_to_null: false) self._ldf = ( DataFrame.new( data, schema: schema, schema_overrides: schema_overrides, orient: orient, infer_schema_length: infer_schema_length, nan_to_null: nan_to_null ) .lazy ._ldf ) end |
Class Method Details
.read_json(file) ⇒ LazyFrame
Read a logical plan from a JSON file to construct a LazyFrame.
178 179 180 181 182 183 184 |
# File 'lib/polars/lazy_frame.rb', line 178 def self.read_json(file) if Utils.pathlike?(file) file = Utils.normalise_filepath(file) end Utils.wrap_ldf(RbLazyFrame.read_json(file)) end |
Instance Method Details
#cache ⇒ LazyFrame
Cache the result once the execution of the physical plan hits this node.
698 699 700 |
# File 'lib/polars/lazy_frame.rb', line 698 def cache _from_rbldf(_ldf.cache) end |
#cleared ⇒ LazyFrame
Create an empty copy of the current LazyFrame.
The copy has an identical schema but no data.
725 726 727 |
# File 'lib/polars/lazy_frame.rb', line 725 def cleared DataFrame.new(columns: schema).lazy end |
#collect(type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false, _eager: false) ⇒ DataFrame
Collect into a DataFrame.
Note: use #fetch if you want to run your query on the first n rows
only. This can be a huge time saver in debugging queries.
463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 |
# File 'lib/polars/lazy_frame.rb', line 463 def collect( type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false, _eager: false ) if no_optimization predicate_pushdown = false projection_pushdown = false slice_pushdown = false common_subplan_elimination = false end if allow_streaming common_subplan_elimination = false end ldf = _ldf.optimization_toggle( type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, common_subplan_elimination, allow_streaming, _eager ) Utils.wrap_df(ldf.collect) end |
#columns ⇒ Array
Get or set column names.
204 205 206 |
# File 'lib/polars/lazy_frame.rb', line 204 def columns _ldf.columns end |
#describe_optimized_plan(type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ String
Create a string representation of the optimized query plan.
338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 |
# File 'lib/polars/lazy_frame.rb', line 338 def describe_optimized_plan( type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false ) ldf = _ldf.optimization_toggle( type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, common_subplan_elimination, allow_streaming, false ) ldf.describe_optimized_plan end |
#describe_plan ⇒ String
Create a string representation of the unoptimized query plan.
331 332 333 |
# File 'lib/polars/lazy_frame.rb', line 331 def describe_plan _ldf.describe_plan end |
#drop(columns) ⇒ LazyFrame
Remove one or multiple columns from a DataFrame.
1724 1725 1726 1727 1728 1729 |
# File 'lib/polars/lazy_frame.rb', line 1724 def drop(columns) if columns.is_a?(::String) columns = [columns] end _from_rbldf(_ldf.drop_columns(columns)) end |
#drop_nulls(subset: nil) ⇒ LazyFrame
Drop rows with null values from this LazyFrame.
2310 2311 2312 2313 2314 2315 |
# File 'lib/polars/lazy_frame.rb', line 2310 def drop_nulls(subset: nil) if !subset.nil? && !subset.is_a?(::Array) subset = [subset] end _from_rbldf(_ldf.drop_nulls(subset)) end |
#dtypes ⇒ Array
Get dtypes of columns in LazyFrame.
222 223 224 |
# File 'lib/polars/lazy_frame.rb', line 222 def dtypes _ldf.dtypes end |
#explode(columns) ⇒ LazyFrame
Explode lists to long format.
2258 2259 2260 2261 |
# File 'lib/polars/lazy_frame.rb', line 2258 def explode(columns) columns = Utils.selection_to_rbexpr_list(columns) _from_rbldf(_ldf.explode(columns)) end |
#fetch(n_rows = 500, type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ DataFrame
Collect a small number of rows for debugging purposes.
Fetch is like a #collect operation, but it overwrites the number of rows read by every scan operation. This is a utility that helps debug a query on a smaller number of rows.
Note that the fetch does not guarantee the final number of rows in the DataFrame. Filter, join operations and a lower number of rows available in the scanned file influence the final number of rows.
643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 |
# File 'lib/polars/lazy_frame.rb', line 643 def fetch( n_rows = 500, type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false ) if no_optimization predicate_pushdown = false projection_pushdown = false slice_pushdown = false common_subplan_elimination = false end ldf = _ldf.optimization_toggle( type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, common_subplan_elimination, allow_streaming, false ) Utils.wrap_df(ldf.fetch(n_rows)) end |
#fill_nan(fill_value) ⇒ LazyFrame
Note that floating point NaN (Not a Number) are not missing values!
To replace missing values, use fill_null instead.
Fill floating point NaN values.
2033 2034 2035 2036 2037 2038 |
# File 'lib/polars/lazy_frame.rb', line 2033 def fill_nan(fill_value) if !fill_value.is_a?(Expr) fill_value = Utils.lit(fill_value) end _from_rbldf(_ldf.fill_nan(fill_value._rbexpr)) end |
#fill_null(value = nil, strategy: nil, limit: nil, matches_supertype: nil) ⇒ LazyFrame
Fill null values using the specified value or strategy.
1998 1999 2000 |
# File 'lib/polars/lazy_frame.rb', line 1998 def fill_null(value = nil, strategy: nil, limit: nil, matches_supertype: nil) select(Polars.all.fill_null(value, strategy: strategy, limit: limit)) end |
#filter(predicate) ⇒ LazyFrame
Filter the rows in the DataFrame based on a predicate expression.
767 768 769 770 771 772 773 |
# File 'lib/polars/lazy_frame.rb', line 767 def filter(predicate) _from_rbldf( _ldf.filter( Utils.expr_to_lit_or_expr(predicate, str_to_lit: false)._rbexpr ) ) end |
#first ⇒ LazyFrame
Get the first row of the DataFrame.
1934 1935 1936 |
# File 'lib/polars/lazy_frame.rb', line 1934 def first slice(0, 1) end |
#group_by(by, maintain_order: false) ⇒ LazyGroupBy Also known as: groupby, group
Start a group by operation.
893 894 895 896 897 |
# File 'lib/polars/lazy_frame.rb', line 893 def group_by(by, maintain_order: false) rbexprs_by = Utils.selection_to_rbexpr_list(by) lgb = _ldf.group_by(rbexprs_by, maintain_order) LazyGroupBy.new(lgb) end |
#group_by_dynamic(index_column, every:, period: nil, offset: nil, truncate: nil, include_boundaries: false, closed: "left", label: "left", by: nil, start_by: "window", check_sorted: true) ⇒ DataFrame Also known as: groupby_dynamic
Group based on a time value (or index value of type :i32, :i64).
Time windows are calculated and rows are assigned to windows. Different from a normal group by is that a row can be member of multiple groups. The time/index window could be seen as a rolling window, with a window size determined by dates/times/values instead of slots in the DataFrame.
A window is defined by:
- every: interval of the window
- period: length of the window
- offset: offset of the window
The every, period and offset arguments are created with
the following string language:
- 1ns (1 nanosecond)
- 1us (1 microsecond)
- 1ms (1 millisecond)
- 1s (1 second)
- 1m (1 minute)
- 1h (1 hour)
- 1d (1 day)
- 1w (1 week)
- 1mo (1 calendar month)
- 1y (1 calendar year)
- 1i (1 index count)
Or combine them: "3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds
In case of a group_by_dynamic on an integer column, the windows are defined by:
- "1i" # length 1
- "10i" # length 10
1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 |
# File 'lib/polars/lazy_frame.rb', line 1247 def group_by_dynamic( index_column, every:, period: nil, offset: nil, truncate: nil, include_boundaries: false, closed: "left", label: "left", by: nil, start_by: "window", check_sorted: true ) if !truncate.nil? label = truncate ? "left" : "datapoint" end index_column = Utils.expr_to_lit_or_expr(index_column, str_to_lit: false) if offset.nil? offset = period.nil? ? "-#{every}" : "0ns" end if period.nil? period = every end period = Utils._timedelta_to_pl_duration(period) offset = Utils._timedelta_to_pl_duration(offset) every = Utils._timedelta_to_pl_duration(every) rbexprs_by = by.nil? ? [] : Utils.selection_to_rbexpr_list(by) lgb = _ldf.group_by_dynamic( index_column._rbexpr, every, period, offset, label, include_boundaries, closed, rbexprs_by, start_by, check_sorted ) LazyGroupBy.new(lgb) end |
#group_by_rolling(index_column:, period:, offset: nil, closed: "right", by: nil, check_sorted: true) ⇒ LazyFrame Also known as: groupby_rolling
Create rolling groups based on a time column.
Also works for index values of type :i32 or :i64.
Different from a dynamic_group_by the windows are now determined by the
individual values and are not of constant intervals. For constant intervals
use group_by_dynamic.
The period and offset arguments are created either from a timedelta, or
by using the following string language:
- 1ns (1 nanosecond)
- 1us (1 microsecond)
- 1ms (1 millisecond)
- 1s (1 second)
- 1m (1 minute)
- 1h (1 hour)
- 1d (1 day)
- 1w (1 week)
- 1mo (1 calendar month)
- 1y (1 calendar year)
- 1i (1 index count)
Or combine them: "3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds
In case of a group_by_rolling on an integer column, the windows are defined by:
- "1i" # length 1
- "10i" # length 10
991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 |
# File 'lib/polars/lazy_frame.rb', line 991 def group_by_rolling( index_column:, period:, offset: nil, closed: "right", by: nil, check_sorted: true ) index_column = Utils.parse_as_expression(index_column) if offset.nil? offset = "-#{period}" end rbexprs_by = by.nil? ? [] : Utils.selection_to_rbexpr_list(by) period = Utils._timedelta_to_pl_duration(period) offset = Utils._timedelta_to_pl_duration(offset) lgb = _ldf.group_by_rolling( index_column, period, offset, closed, rbexprs_by, check_sorted ) LazyGroupBy.new(lgb) end |
#head(n = 5) ⇒ LazyFrame
1910 1911 1912 |
# File 'lib/polars/lazy_frame.rb', line 1910 def head(n = 5) slice(0, n) end |
#include?(key) ⇒ Boolean
Check if LazyFrame includes key.
259 260 261 |
# File 'lib/polars/lazy_frame.rb', line 259 def include?(key) columns.include?(key) end |
#interpolate ⇒ LazyFrame
Interpolate intermediate values. The interpolation method is linear.
2411 2412 2413 |
# File 'lib/polars/lazy_frame.rb', line 2411 def interpolate select(Utils.col("*").interpolate) end |
#join(other, left_on: nil, right_on: nil, on: nil, how: "inner", suffix: "_right", allow_parallel: true, force_parallel: false) ⇒ LazyFrame
Add a join operation to the Logical Plan.
1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 |
# File 'lib/polars/lazy_frame.rb', line 1531 def join( other, left_on: nil, right_on: nil, on: nil, how: "inner", suffix: "_right", allow_parallel: true, force_parallel: false ) if !other.is_a?(LazyFrame) raise ArgumentError, "Expected a `LazyFrame` as join table, got #{other.class.name}" end if how == "cross" return _from_rbldf( _ldf.join( other._ldf, [], [], allow_parallel, force_parallel, how, suffix ) ) end if !on.nil? rbexprs = Utils.selection_to_rbexpr_list(on) rbexprs_left = rbexprs rbexprs_right = rbexprs elsif !left_on.nil? && !right_on.nil? rbexprs_left = Utils.selection_to_rbexpr_list(left_on) rbexprs_right = Utils.selection_to_rbexpr_list(right_on) else raise ArgumentError, "must specify `on` OR `left_on` and `right_on`" end _from_rbldf( self._ldf.join( other._ldf, rbexprs_left, rbexprs_right, allow_parallel, force_parallel, how, suffix, ) ) end |
#join_asof(other, left_on: nil, right_on: nil, on: nil, by_left: nil, by_right: nil, by: nil, strategy: "backward", suffix: "_right", tolerance: nil, allow_parallel: true, force_parallel: false) ⇒ LazyFrame
Perform an asof join.
This is similar to a left-join except that we match on nearest key rather than equal keys.
Both DataFrames must be sorted by the join_asof key.
For each row in the left DataFrame:
- A "backward" search selects the last row in the right DataFrame whose 'on' key is less than or equal to the left's key.
- A "forward" search selects the first row in the right DataFrame whose 'on' key is greater than or equal to the left's key.
The default is "backward".
1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 |
# File 'lib/polars/lazy_frame.rb', line 1356 def join_asof( other, left_on: nil, right_on: nil, on: nil, by_left: nil, by_right: nil, by: nil, strategy: "backward", suffix: "_right", tolerance: nil, allow_parallel: true, force_parallel: false ) if !other.is_a?(LazyFrame) raise ArgumentError, "Expected a `LazyFrame` as join table, got #{other.class.name}" end if on.is_a?(::String) left_on = on right_on = on end if left_on.nil? || right_on.nil? raise ArgumentError, "You should pass the column to join on as an argument." end if by_left.is_a?(::String) || by_left.is_a?(Expr) by_left_ = [by_left] else by_left_ = by_left end if by_right.is_a?(::String) || by_right.is_a?(Expr) by_right_ = [by_right] else by_right_ = by_right end if by.is_a?(::String) by_left_ = [by] by_right_ = [by] elsif by.is_a?(::Array) by_left_ = by by_right_ = by end tolerance_str = nil tolerance_num = nil if tolerance.is_a?(::String) tolerance_str = tolerance else tolerance_num = tolerance end _from_rbldf( _ldf.join_asof( other._ldf, Polars.col(left_on)._rbexpr, Polars.col(right_on)._rbexpr, by_left_, by_right_, allow_parallel, force_parallel, suffix, strategy, tolerance_num, tolerance_str ) ) end |
#last ⇒ LazyFrame
Get the last row of the DataFrame.
1927 1928 1929 |
# File 'lib/polars/lazy_frame.rb', line 1927 def last tail(1) end |
#lazy ⇒ LazyFrame
Return lazy representation, i.e. itself.
Useful for writing code that expects either a DataFrame or
LazyFrame.
691 692 693 |
# File 'lib/polars/lazy_frame.rb', line 691 def lazy self end |
#limit(n = 5) ⇒ LazyFrame
1895 1896 1897 |
# File 'lib/polars/lazy_frame.rb', line 1895 def limit(n = 5) head(5) end |
#max ⇒ LazyFrame
Aggregate the columns in the DataFrame to their maximum value.
2120 2121 2122 |
# File 'lib/polars/lazy_frame.rb', line 2120 def max _from_rbldf(_ldf.max) end |
#mean ⇒ LazyFrame
Aggregate the columns in the DataFrame to their mean value.
2180 2181 2182 |
# File 'lib/polars/lazy_frame.rb', line 2180 def mean _from_rbldf(_ldf.mean) end |
#median ⇒ LazyFrame
Aggregate the columns in the DataFrame to their median value.
2200 2201 2202 |
# File 'lib/polars/lazy_frame.rb', line 2200 def median _from_rbldf(_ldf.median) end |
#melt(id_vars: nil, value_vars: nil, variable_name: nil, value_name: nil, streamable: true) ⇒ LazyFrame
Unpivot a DataFrame from wide to long format.
Optionally leaves identifiers set.
This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are "unpivoted" to the row axis, leaving just two non-identifier columns, 'variable' and 'value'.
2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 |
# File 'lib/polars/lazy_frame.rb', line 2365 def melt(id_vars: nil, value_vars: nil, variable_name: nil, value_name: nil, streamable: true) if value_vars.is_a?(::String) value_vars = [value_vars] end if id_vars.is_a?(::String) id_vars = [id_vars] end if value_vars.nil? value_vars = [] end if id_vars.nil? id_vars = [] end _from_rbldf( _ldf.melt(id_vars, value_vars, value_name, variable_name, streamable) ) end |
#min ⇒ LazyFrame
Aggregate the columns in the DataFrame to their minimum value.
2140 2141 2142 |
# File 'lib/polars/lazy_frame.rb', line 2140 def min _from_rbldf(_ldf.min) end |
#pipe(func, *args, **kwargs, &block) ⇒ LazyFrame
Offers a structured way to apply a sequence of user-defined functions (UDFs).
324 325 326 |
# File 'lib/polars/lazy_frame.rb', line 324 def pipe(func, *args, **kwargs, &block) func.call(self, *args, **kwargs, &block) end |
#quantile(quantile, interpolation: "nearest") ⇒ LazyFrame
Aggregate the columns in the DataFrame to their quantile value.
2225 2226 2227 2228 |
# File 'lib/polars/lazy_frame.rb', line 2225 def quantile(quantile, interpolation: "nearest") quantile = Utils.expr_to_lit_or_expr(quantile, str_to_lit: false) _from_rbldf(_ldf.quantile(quantile._rbexpr, interpolation)) end |
#rename(mapping) ⇒ LazyFrame
Rename column names.
1737 1738 1739 1740 1741 |
# File 'lib/polars/lazy_frame.rb', line 1737 def rename(mapping) existing = mapping.keys _new = mapping.values _from_rbldf(_ldf.rename(existing, _new)) end |
#reverse ⇒ LazyFrame
Reverse the DataFrame.
1746 1747 1748 |
# File 'lib/polars/lazy_frame.rb', line 1746 def reverse _from_rbldf(_ldf.reverse) end |
#schema ⇒ Hash
Get the schema.
240 241 242 |
# File 'lib/polars/lazy_frame.rb', line 240 def schema _ldf.schema end |
#select(exprs) ⇒ LazyFrame
Select columns from this DataFrame.
858 859 860 861 |
# File 'lib/polars/lazy_frame.rb', line 858 def select(exprs) exprs = Utils.selection_to_rbexpr_list(exprs) _from_rbldf(_ldf.select(exprs)) end |
#set_sorted(column, *more_columns, descending: false) ⇒ LazyFrame
Indicate that one or multiple columns are sorted.
2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 |
# File 'lib/polars/lazy_frame.rb', line 2487 def set_sorted( column, *more_columns, descending: false ) columns = Utils.selection_to_rbexpr_list(column) if more_columns.any? columns.concat(Utils.selection_to_rbexpr_list(more_columns)) end with_columns( columns.map { |e| Utils.wrap_expr(e).set_sorted(descending: descending) } ) end |
#shift(n, fill_value: nil) ⇒ LazyFrame
Shift the values by a given period.
1792 1793 1794 1795 1796 1797 1798 |
# File 'lib/polars/lazy_frame.rb', line 1792 def shift(n, fill_value: nil) if !fill_value.nil? fill_value = Utils.parse_as_expression(fill_value, str_as_lit: true) end n = Utils.parse_as_expression(n) _from_rbldf(_ldf.shift(n, fill_value)) end |
#shift_and_fill(periods, fill_value) ⇒ LazyFrame
Shift the values by a given period and fill the resulting null values.
1842 1843 1844 |
# File 'lib/polars/lazy_frame.rb', line 1842 def shift_and_fill(periods, fill_value) shift(periods, fill_value: fill_value) end |
#sink_parquet(path, compression: "zstd", compression_level: nil, statistics: false, row_group_size: nil, data_pagesize_limit: nil, maintain_order: true, type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, no_optimization: false, slice_pushdown: true) ⇒ DataFrame
Persists a LazyFrame at the provided path.
This allows streaming results that are larger than RAM to be written to disk.
548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 |
# File 'lib/polars/lazy_frame.rb', line 548 def sink_parquet( path, compression: "zstd", compression_level: nil, statistics: false, row_group_size: nil, data_pagesize_limit: nil, maintain_order: true, type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, no_optimization: false, slice_pushdown: true ) if no_optimization predicate_pushdown = false projection_pushdown = false slice_pushdown = false end lf = _ldf.optimization_toggle( type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, false, true, false ) lf.sink_parquet( path, compression, compression_level, statistics, row_group_size, data_pagesize_limit, maintain_order ) end |
#slice(offset, length = nil) ⇒ LazyFrame
Get a slice of this DataFrame.
1875 1876 1877 1878 1879 1880 |
# File 'lib/polars/lazy_frame.rb', line 1875 def slice(offset, length = nil) if length && length < 0 raise ArgumentError, "Negative slice lengths (#{length}) are invalid for LazyFrame" end _from_rbldf(_ldf.slice(offset, length)) end |
#sort(by, reverse: false, nulls_last: false, maintain_order: false) ⇒ LazyFrame
Sort the DataFrame.
Sorting can be done by:
- A single column name
- An expression
- Multiple expressions
401 402 403 404 405 406 407 408 409 410 411 |
# File 'lib/polars/lazy_frame.rb', line 401 def sort(by, reverse: false, nulls_last: false, maintain_order: false) if by.is_a?(::String) return _from_rbldf(_ldf.sort(by, reverse, nulls_last, maintain_order)) end if Utils.bool?(reverse) reverse = [reverse] end by = Utils.selection_to_rbexpr_list(by) _from_rbldf(_ldf.sort_by_exprs(by, reverse, nulls_last, maintain_order)) end |
#std(ddof: 1) ⇒ LazyFrame
Aggregate the columns in the DataFrame to their standard deviation value.
2068 2069 2070 |
# File 'lib/polars/lazy_frame.rb', line 2068 def std(ddof: 1) _from_rbldf(_ldf.std(ddof)) end |
#sum ⇒ LazyFrame
Aggregate the columns in the DataFrame to their sum value.
2160 2161 2162 |
# File 'lib/polars/lazy_frame.rb', line 2160 def sum _from_rbldf(_ldf.sum) end |
#tail(n = 5) ⇒ LazyFrame
Get the last n rows.
1920 1921 1922 |
# File 'lib/polars/lazy_frame.rb', line 1920 def tail(n = 5) _from_rbldf(_ldf.tail(n)) end |
#take_every(n) ⇒ LazyFrame
Take every nth row in the LazyFrame and return as a new LazyFrame.
1991 1992 1993 |
# File 'lib/polars/lazy_frame.rb', line 1991 def take_every(n) select(Utils.col("*").take_every(n)) end |
#to_s ⇒ String
Returns a string representing the LazyFrame.
271 272 273 274 275 276 277 |
# File 'lib/polars/lazy_frame.rb', line 271 def to_s <<~EOS naive plan: (run LazyFrame#describe_optimized_plan to see the optimized plan) #{describe_plan} EOS end |
#unique(maintain_order: true, subset: nil, keep: "first") ⇒ LazyFrame
Drop duplicate rows from this DataFrame.
Note that this fails if there is a column of type List in the DataFrame or
subset.
2277 2278 2279 2280 2281 2282 |
# File 'lib/polars/lazy_frame.rb', line 2277 def unique(maintain_order: true, subset: nil, keep: "first") if !subset.nil? && !subset.is_a?(::Array) subset = [subset] end _from_rbldf(_ldf.unique(maintain_order, subset, keep)) end |
#unnest(names) ⇒ LazyFrame
Decompose a struct into its fields.
The fields will be inserted into the DataFrame on the location of the
struct type.
2466 2467 2468 2469 2470 2471 |
# File 'lib/polars/lazy_frame.rb', line 2466 def unnest(names) if names.is_a?(::String) names = [names] end _from_rbldf(_ldf.unnest(names)) end |
#var(ddof: 1) ⇒ LazyFrame
Aggregate the columns in the DataFrame to their variance value.
2100 2101 2102 |
# File 'lib/polars/lazy_frame.rb', line 2100 def var(ddof: 1) _from_rbldf(_ldf.var(ddof)) end |
#width ⇒ Integer
Get the width of the LazyFrame.
252 253 254 |
# File 'lib/polars/lazy_frame.rb', line 252 def width _ldf.width end |
#with_column(column) ⇒ LazyFrame
Add or overwrite column in a DataFrame.
1713 1714 1715 |
# File 'lib/polars/lazy_frame.rb', line 1713 def with_column(column) with_columns([column]) end |
#with_columns(exprs) ⇒ LazyFrame
Add or overwrite multiple columns in a DataFrame.
1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 |
# File 'lib/polars/lazy_frame.rb', line 1611 def with_columns(exprs) exprs = if exprs.nil? [] elsif exprs.is_a?(Expr) [exprs] else exprs.to_a end rbexprs = [] exprs.each do |e| case e when Expr rbexprs << e._rbexpr when Series rbexprs << Utils.lit(e)._rbexpr else raise ArgumentError, "Expected an expression, got #{e}" end end _from_rbldf(_ldf.with_columns(rbexprs)) end |
#with_context(other) ⇒ LazyFrame
Add an external context to the computation graph.
This allows expressions to also access columns from DataFrames that are not part of this one.
1665 1666 1667 1668 1669 1670 1671 |
# File 'lib/polars/lazy_frame.rb', line 1665 def with_context(other) if !other.is_a?(::Array) other = [other] end _from_rbldf(_ldf.with_context(other.map(&:_ldf))) end |
#with_row_count(name: "row_nr", offset: 0) ⇒ LazyFrame
This can have a negative effect on query performance. This may, for instance, block predicate pushdown optimization.
Add a column at index 0 that counts the rows.
1970 1971 1972 |
# File 'lib/polars/lazy_frame.rb', line 1970 def with_row_count(name: "row_nr", offset: 0) _from_rbldf(_ldf.with_row_count(name, offset)) end |
#write_json(file) ⇒ nil
Write the logical plan of this LazyFrame to a file or string in JSON format.
285 286 287 288 289 290 291 |
# File 'lib/polars/lazy_frame.rb', line 285 def write_json(file) if Utils.pathlike?(file) file = Utils.normalise_filepath(file) end _ldf.write_json(file) nil end |