Class: Polars::LazyFrame
- Inherits:
-
Object
- Object
- Polars::LazyFrame
- Defined in:
- lib/polars/lazy_frame.rb
Overview
Representation of a Lazy computation graph/query againat a DataFrame.
Class Method Summary collapse
-
.read_json(file) ⇒ LazyFrame
Read a logical plan from a JSON file to construct a LazyFrame.
Instance Method Summary collapse
-
#cache ⇒ LazyFrame
Cache the result once the execution of the physical plan hits this node.
-
#cleared ⇒ LazyFrame
Create an empty copy of the current LazyFrame.
-
#collect(type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ DataFrame
Collect into a DataFrame.
-
#columns ⇒ Array
Get or set column names.
-
#describe_optimized_plan(type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ String
Create a string representation of the optimized query plan.
-
#describe_plan ⇒ String
Create a string representation of the unoptimized query plan.
-
#drop(columns) ⇒ LazyFrame
Remove one or multiple columns from a DataFrame.
-
#drop_nulls(subset: nil) ⇒ LazyFrame
Drop rows with null values from this LazyFrame.
-
#dtypes ⇒ Array
Get dtypes of columns in LazyFrame.
-
#explode(columns) ⇒ LazyFrame
Explode lists to long format.
-
#fetch(n_rows = 500, type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ DataFrame
Collect a small number of rows for debugging purposes.
-
#fill_nan(fill_value) ⇒ LazyFrame
Fill floating point NaN values.
-
#fill_null(value = nil, strategy: nil, limit: nil, matches_supertype: nil) ⇒ LazyFrame
Fill null values using the specified value or strategy.
-
#filter(predicate) ⇒ LazyFrame
Filter the rows in the DataFrame based on a predicate expression.
-
#first ⇒ LazyFrame
Get the first row of the DataFrame.
-
#groupby(by, maintain_order: false) ⇒ LazyGroupBy
Start a groupby operation.
-
#groupby_dynamic(index_column, every:, period: nil, offset: nil, truncate: true, include_boundaries: false, closed: "left", by: nil, start_by: "window") ⇒ DataFrame
Group based on a time value (or index value of type
:i32
,:i64
). -
#groupby_rolling(index_column:, period:, offset: nil, closed: "right", by: nil) ⇒ LazyFrame
Create rolling groups based on a time column.
-
#head(n = 5) ⇒ LazyFrame
Get the first
n
rows. -
#include?(key) ⇒ Boolean
Check if LazyFrame includes key.
-
#interpolate ⇒ LazyFrame
Interpolate intermediate values.
-
#join(other, left_on: nil, right_on: nil, on: nil, how: "inner", suffix: "_right", allow_parallel: true, force_parallel: false) ⇒ LazyFrame
Add a join operation to the Logical Plan.
-
#join_asof(other, left_on: nil, right_on: nil, on: nil, by_left: nil, by_right: nil, by: nil, strategy: "backward", suffix: "_right", tolerance: nil, allow_parallel: true, force_parallel: false) ⇒ LazyFrame
Perform an asof join.
-
#last ⇒ LazyFrame
Get the last row of the DataFrame.
-
#lazy ⇒ LazyFrame
Return lazy representation, i.e.
-
#limit(n = 5) ⇒ LazyFrame
Get the first
n
rows. -
#max ⇒ LazyFrame
Aggregate the columns in the DataFrame to their maximum value.
-
#mean ⇒ LazyFrame
Aggregate the columns in the DataFrame to their mean value.
-
#median ⇒ LazyFrame
Aggregate the columns in the DataFrame to their median value.
-
#melt(id_vars: nil, value_vars: nil, variable_name: nil, value_name: nil) ⇒ LazyFrame
Unpivot a DataFrame from wide to long format.
-
#min ⇒ LazyFrame
Aggregate the columns in the DataFrame to their minimum value.
-
#pipe(func, *args, **kwargs, &block) ⇒ LazyFrame
Offers a structured way to apply a sequence of user-defined functions (UDFs).
-
#quantile(quantile, interpolation: "nearest") ⇒ LazyFrame
Aggregate the columns in the DataFrame to their quantile value.
-
#rename(mapping) ⇒ LazyFrame
Rename column names.
-
#reverse ⇒ LazyFrame
Reverse the DataFrame.
-
#schema ⇒ Hash
Get the schema.
-
#select(exprs) ⇒ LazyFrame
Select columns from this DataFrame.
-
#shift(periods) ⇒ LazyFrame
Shift the values by a given period.
-
#shift_and_fill(periods, fill_value) ⇒ LazyFrame
Shift the values by a given period and fill the resulting null values.
-
#slice(offset, length = nil) ⇒ LazyFrame
Get a slice of this DataFrame.
-
#sort(by, reverse: false, nulls_last: false) ⇒ LazyFrame
Sort the DataFrame.
-
#std(ddof: 1) ⇒ LazyFrame
Aggregate the columns in the DataFrame to their standard deviation value.
-
#sum ⇒ LazyFrame
Aggregate the columns in the DataFrame to their sum value.
-
#tail(n = 5) ⇒ LazyFrame
Get the last
n
rows. -
#take_every(n) ⇒ LazyFrame
Take every nth row in the LazyFrame and return as a new LazyFrame.
-
#to_s ⇒ String
Returns a string representing the LazyFrame.
-
#unique(maintain_order: true, subset: nil, keep: "first") ⇒ LazyFrame
Drop duplicate rows from this DataFrame.
-
#unnest(names) ⇒ LazyFrame
Decompose a struct into its fields.
-
#var(ddof: 1) ⇒ LazyFrame
Aggregate the columns in the DataFrame to their variance value.
-
#width ⇒ Integer
Get the width of the LazyFrame.
-
#with_column(column) ⇒ LazyFrame
Add or overwrite column in a DataFrame.
-
#with_columns(exprs) ⇒ LazyFrame
Add or overwrite multiple columns in a DataFrame.
-
#with_context(other) ⇒ LazyFrame
Add an external context to the computation graph.
-
#with_row_count(name: "row_nr", offset: 0) ⇒ LazyFrame
Add a column at index 0 that counts the rows.
-
#write_json(file) ⇒ nil
Write the logical plan of this LazyFrame to a file or string in JSON format.
Class Method Details
.read_json(file) ⇒ LazyFrame
Read a logical plan from a JSON file to construct a LazyFrame.
158 159 160 161 162 163 164 |
# File 'lib/polars/lazy_frame.rb', line 158 def self.read_json(file) if file.is_a?(String) || (defined?(Pathname) && file.is_a?(Pathname)) file = Utils.format_path(file) end Utils.wrap_ldf(RbLazyFrame.read_json(file)) end |
Instance Method Details
#cache ⇒ LazyFrame
Cache the result once the execution of the physical plan hits this node.
591 592 593 |
# File 'lib/polars/lazy_frame.rb', line 591 def cache _from_rbldf(_ldf.cache) end |
#cleared ⇒ LazyFrame
Create an empty copy of the current LazyFrame.
The copy has an identical schema but no data.
618 619 620 |
# File 'lib/polars/lazy_frame.rb', line 618 def cleared DataFrame.new(columns: schema).lazy end |
#collect(type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ DataFrame
Collect into a DataFrame.
Note: use #fetch if you want to run your query on the first n
rows
only. This can be a huge time saver in debugging queries.
449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 |
# File 'lib/polars/lazy_frame.rb', line 449 def collect( type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false ) if no_optimization predicate_pushdown = false projection_pushdown = false slice_pushdown = false common_subplan_elimination = false end if allow_streaming common_subplan_elimination = false end ldf = _ldf.optimization_toggle( type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, common_subplan_elimination, allow_streaming ) Utils.wrap_df(ldf.collect) end |
#columns ⇒ Array
Get or set column names.
184 185 186 |
# File 'lib/polars/lazy_frame.rb', line 184 def columns _ldf.columns end |
#describe_optimized_plan(type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ String
Create a string representation of the optimized query plan.
321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 |
# File 'lib/polars/lazy_frame.rb', line 321 def describe_optimized_plan( type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false ) ldf = _ldf.optimization_toggle( type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, common_subplan_elimination, allow_streaming, ) ldf.describe_optimized_plan end |
#describe_plan ⇒ String
Create a string representation of the unoptimized query plan.
314 315 316 |
# File 'lib/polars/lazy_frame.rb', line 314 def describe_plan _ldf.describe_plan end |
#drop(columns) ⇒ LazyFrame
Remove one or multiple columns from a DataFrame.
1656 1657 1658 1659 1660 1661 |
# File 'lib/polars/lazy_frame.rb', line 1656 def drop(columns) if columns.is_a?(String) columns = [columns] end _from_rbldf(_ldf.drop_columns(columns)) end |
#drop_nulls(subset: nil) ⇒ LazyFrame
Drop rows with null values from this LazyFrame.
2262 2263 2264 2265 2266 2267 |
# File 'lib/polars/lazy_frame.rb', line 2262 def drop_nulls(subset: nil) if !subset.nil? && !subset.is_a?(Array) subset = [subset] end _from_rbldf(_ldf.drop_nulls(subset)) end |
#dtypes ⇒ Array
Get dtypes of columns in LazyFrame.
202 203 204 |
# File 'lib/polars/lazy_frame.rb', line 202 def dtypes _ldf.dtypes end |
#explode(columns) ⇒ LazyFrame
Explode lists to long format.
2209 2210 2211 2212 |
# File 'lib/polars/lazy_frame.rb', line 2209 def explode(columns) columns = Utils.selection_to_rbexpr_list(columns) _from_rbldf(_ldf.explode(columns)) end |
#fetch(n_rows = 500, type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ DataFrame
Collect a small number of rows for debugging purposes.
Fetch is like a #collect operation, but it overwrites the number of rows read by every scan operation. This is a utility that helps debug a query on a smaller number of rows.
Note that the fetch does not guarantee the final number of rows in the DataFrame. Filter, join operations and a lower number of rows available in the scanned file influence the final number of rows.
537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 |
# File 'lib/polars/lazy_frame.rb', line 537 def fetch( n_rows = 500, type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false ) if no_optimization predicate_pushdown = false projection_pushdown = false slice_pushdown = false common_subplan_elimination = false end ldf = _ldf.optimization_toggle( type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, common_subplan_elimination, allow_streaming ) Utils.wrap_df(ldf.fetch(n_rows)) end |
#fill_nan(fill_value) ⇒ LazyFrame
Note that floating point NaN (Not a Number) are not missing values!
To replace missing values, use fill_null
instead.
Fill floating point NaN values.
1977 1978 1979 1980 1981 1982 |
# File 'lib/polars/lazy_frame.rb', line 1977 def fill_nan(fill_value) if !fill_value.is_a?(Expr) fill_value = Utils.lit(fill_value) end _from_rbldf(_ldf.fill_nan(fill_value._rbexpr)) end |
#fill_null(value = nil, strategy: nil, limit: nil, matches_supertype: nil) ⇒ LazyFrame
Fill null values using the specified value or strategy.
1939 1940 1941 |
# File 'lib/polars/lazy_frame.rb', line 1939 def fill_null(value = nil, strategy: nil, limit: nil, matches_supertype: nil) select(Polars.all.fill_null(value, strategy: strategy, limit: limit)) end |
#filter(predicate) ⇒ LazyFrame
Filter the rows in the DataFrame based on a predicate expression.
661 662 663 664 665 666 667 |
# File 'lib/polars/lazy_frame.rb', line 661 def filter(predicate) _from_rbldf( _ldf.filter( Utils.expr_to_lit_or_expr(predicate, str_to_lit: false)._rbexpr ) ) end |
#first ⇒ LazyFrame
Get the first row of the DataFrame.
1872 1873 1874 |
# File 'lib/polars/lazy_frame.rb', line 1872 def first slice(0, 1) end |
#groupby(by, maintain_order: false) ⇒ LazyGroupBy
Start a groupby operation.
799 800 801 802 803 |
# File 'lib/polars/lazy_frame.rb', line 799 def groupby(by, maintain_order: false) rbexprs_by = Utils.selection_to_rbexpr_list(by) lgb = _ldf.groupby(rbexprs_by, maintain_order) LazyGroupBy.new(lgb, self.class) end |
#groupby_dynamic(index_column, every:, period: nil, offset: nil, truncate: true, include_boundaries: false, closed: "left", by: nil, start_by: "window") ⇒ DataFrame
Group based on a time value (or index value of type :i32
, :i64
).
Time windows are calculated and rows are assigned to windows. Different from a normal groupby is that a row can be member of multiple groups. The time/index window could be seen as a rolling window, with a window size determined by dates/times/values instead of slots in the DataFrame.
A window is defined by:
- every: interval of the window
- period: length of the window
- offset: offset of the window
The every
, period
and offset
arguments are created with
the following string language:
- 1ns (1 nanosecond)
- 1us (1 microsecond)
- 1ms (1 millisecond)
- 1s (1 second)
- 1m (1 minute)
- 1h (1 hour)
- 1d (1 day)
- 1w (1 week)
- 1mo (1 calendar month)
- 1y (1 calendar year)
- 1i (1 index count)
Or combine them: "3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds
In case of a groupby_dynamic on an integer column, the windows are defined by:
- "1i" # length 1
- "10i" # length 10
1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 |
# File 'lib/polars/lazy_frame.rb', line 1168 def groupby_dynamic( index_column, every:, period: nil, offset: nil, truncate: true, include_boundaries: false, closed: "left", by: nil, start_by: "window" ) if offset.nil? if period.nil? offset = "-#{every}" else offset = "0ns" end end if period.nil? period = every end period = Utils._timedelta_to_pl_duration(period) offset = Utils._timedelta_to_pl_duration(offset) every = Utils._timedelta_to_pl_duration(every) rbexprs_by = by.nil? ? [] : Utils.selection_to_rbexpr_list(by) lgb = _ldf.groupby_dynamic( index_column, every, period, offset, truncate, include_boundaries, closed, rbexprs_by, start_by ) LazyGroupBy.new(lgb, self.class) end |
#groupby_rolling(index_column:, period:, offset: nil, closed: "right", by: nil) ⇒ LazyFrame
Create rolling groups based on a time column.
Also works for index values of type :i32
or :i64
.
Different from a dynamic_groupby
the windows are now determined by the
individual values and are not of constant intervals. For constant intervals
use groupby_dynamic.
The period
and offset
arguments are created either from a timedelta, or
by using the following string language:
- 1ns (1 nanosecond)
- 1us (1 microsecond)
- 1ms (1 millisecond)
- 1s (1 second)
- 1m (1 minute)
- 1h (1 hour)
- 1d (1 day)
- 1w (1 week)
- 1mo (1 calendar month)
- 1y (1 calendar year)
- 1i (1 index count)
Or combine them: "3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds
In case of a groupby_rolling on an integer column, the windows are defined by:
- "1i" # length 1
- "10i" # length 10
894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 |
# File 'lib/polars/lazy_frame.rb', line 894 def groupby_rolling( index_column:, period:, offset: nil, closed: "right", by: nil ) if offset.nil? offset = "-#{period}" end rbexprs_by = by.nil? ? [] : Utils.selection_to_rbexpr_list(by) period = Utils._timedelta_to_pl_duration(period) offset = Utils._timedelta_to_pl_duration(offset) lgb = _ldf.groupby_rolling( index_column, period, offset, closed, rbexprs_by ) LazyGroupBy.new(lgb, self.class) end |
#head(n = 5) ⇒ LazyFrame
1848 1849 1850 |
# File 'lib/polars/lazy_frame.rb', line 1848 def head(n = 5) slice(0, n) end |
#include?(key) ⇒ Boolean
Check if LazyFrame includes key.
239 240 241 |
# File 'lib/polars/lazy_frame.rb', line 239 def include?(key) columns.include?(key) end |
#interpolate ⇒ LazyFrame
Interpolate intermediate values. The interpolation method is linear.
2367 2368 2369 |
# File 'lib/polars/lazy_frame.rb', line 2367 def interpolate select(Utils.col("*").interpolate) end |
#join(other, left_on: nil, right_on: nil, on: nil, how: "inner", suffix: "_right", allow_parallel: true, force_parallel: false) ⇒ LazyFrame
Add a join operation to the Logical Plan.
1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 |
# File 'lib/polars/lazy_frame.rb', line 1454 def join( other, left_on: nil, right_on: nil, on: nil, how: "inner", suffix: "_right", allow_parallel: true, force_parallel: false ) if !other.is_a?(LazyFrame) raise ArgumentError, "Expected a `LazyFrame` as join table, got #{other.class.name}" end if how == "cross" return _from_rbldf( _ldf.join( other._ldf, [], [], allow_parallel, force_parallel, how, suffix ) ) end if !on.nil? rbexprs = Utils.selection_to_rbexpr_list(on) rbexprs_left = rbexprs rbexprs_right = rbexprs elsif !left_on.nil? && !right_on.nil? rbexprs_left = Utils.selection_to_rbexpr_list(left_on) rbexprs_right = Utils.selection_to_rbexpr_list(right_on) else raise ArgumentError, "must specify `on` OR `left_on` and `right_on`" end _from_rbldf( self._ldf.join( other._ldf, rbexprs_left, rbexprs_right, allow_parallel, force_parallel, how, suffix, ) ) end |
#join_asof(other, left_on: nil, right_on: nil, on: nil, by_left: nil, by_right: nil, by: nil, strategy: "backward", suffix: "_right", tolerance: nil, allow_parallel: true, force_parallel: false) ⇒ LazyFrame
Perform an asof join.
This is similar to a left-join except that we match on nearest key rather than equal keys.
Both DataFrames must be sorted by the join_asof key.
For each row in the left DataFrame:
- A "backward" search selects the last row in the right DataFrame whose 'on' key is less than or equal to the left's key.
- A "forward" search selects the first row in the right DataFrame whose 'on' key is greater than or equal to the left's key.
The default is "backward".
1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 |
# File 'lib/polars/lazy_frame.rb', line 1272 def join_asof( other, left_on: nil, right_on: nil, on: nil, by_left: nil, by_right: nil, by: nil, strategy: "backward", suffix: "_right", tolerance: nil, allow_parallel: true, force_parallel: false ) if !other.is_a?(LazyFrame) raise ArgumentError, "Expected a `LazyFrame` as join table, got #{other.class.name}" end if on.is_a?(String) left_on = on right_on = on end if left_on.nil? || right_on.nil? raise ArgumentError, "You should pass the column to join on as an argument." end if by_left.is_a?(String) || by_left.is_a?(Expr) by_left_ = [by_left] else by_left_ = by_left end if by_right.is_a?(String) || by_right.is_a?(Expr) by_right_ = [by_right] else by_right_ = by_right end if by.is_a?(String) by_left_ = [by] by_right_ = [by] elsif by.is_a?(Array) by_left_ = by by_right_ = by end tolerance_str = nil tolerance_num = nil if tolerance.is_a?(String) tolerance_str = tolerance else tolerance_num = tolerance end _from_rbldf( _ldf.join_asof( other._ldf, Polars.col(left_on)._rbexpr, Polars.col(right_on)._rbexpr, by_left_, by_right_, allow_parallel, force_parallel, suffix, strategy, tolerance_num, tolerance_str ) ) end |
#last ⇒ LazyFrame
Get the last row of the DataFrame.
1865 1866 1867 |
# File 'lib/polars/lazy_frame.rb', line 1865 def last tail(1) end |
#lazy ⇒ LazyFrame
Return lazy representation, i.e. itself.
Useful for writing code that expects either a DataFrame
or
LazyFrame
.
584 585 586 |
# File 'lib/polars/lazy_frame.rb', line 584 def lazy self end |
#limit(n = 5) ⇒ LazyFrame
1833 1834 1835 |
# File 'lib/polars/lazy_frame.rb', line 1833 def limit(n = 5) head(5) end |
#max ⇒ LazyFrame
Aggregate the columns in the DataFrame to their maximum value.
2064 2065 2066 |
# File 'lib/polars/lazy_frame.rb', line 2064 def max _from_rbldf(_ldf.max) end |
#mean ⇒ LazyFrame
Aggregate the columns in the DataFrame to their mean value.
2124 2125 2126 |
# File 'lib/polars/lazy_frame.rb', line 2124 def mean _from_rbldf(_ldf.mean) end |
#median ⇒ LazyFrame
Aggregate the columns in the DataFrame to their median value.
2144 2145 2146 |
# File 'lib/polars/lazy_frame.rb', line 2144 def median _from_rbldf(_ldf.median) end |
#melt(id_vars: nil, value_vars: nil, variable_name: nil, value_name: nil) ⇒ LazyFrame
Unpivot a DataFrame from wide to long format.
Optionally leaves identifiers set.
This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are "unpivoted" to the row axis, leaving just two non-identifier columns, 'variable' and 'value'.
2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 |
# File 'lib/polars/lazy_frame.rb', line 2318 def melt(id_vars: nil, value_vars: nil, variable_name: nil, value_name: nil) if value_vars.is_a?(String) value_vars = [value_vars] end if id_vars.is_a?(String) id_vars = [id_vars] end if value_vars.nil? value_vars = [] end if id_vars.nil? id_vars = [] end _from_rbldf( _ldf.melt(id_vars, value_vars, value_name, variable_name) ) end |
#min ⇒ LazyFrame
Aggregate the columns in the DataFrame to their minimum value.
2084 2085 2086 |
# File 'lib/polars/lazy_frame.rb', line 2084 def min _from_rbldf(_ldf.min) end |
#pipe(func, *args, **kwargs, &block) ⇒ LazyFrame
Offers a structured way to apply a sequence of user-defined functions (UDFs).
307 308 309 |
# File 'lib/polars/lazy_frame.rb', line 307 def pipe(func, *args, **kwargs, &block) func.call(self, *args, **kwargs, &block) end |
#quantile(quantile, interpolation: "nearest") ⇒ LazyFrame
Aggregate the columns in the DataFrame to their quantile value.
2169 2170 2171 2172 |
# File 'lib/polars/lazy_frame.rb', line 2169 def quantile(quantile, interpolation: "nearest") quantile = Utils.expr_to_lit_or_expr(quantile, str_to_lit: false) _from_rbldf(_ldf.quantile(quantile._rbexpr, interpolation)) end |
#rename(mapping) ⇒ LazyFrame
Rename column names.
1669 1670 1671 1672 1673 |
# File 'lib/polars/lazy_frame.rb', line 1669 def rename(mapping) existing = mapping.keys _new = mapping.values _from_rbldf(_ldf.rename(existing, _new)) end |
#reverse ⇒ LazyFrame
Reverse the DataFrame.
1678 1679 1680 |
# File 'lib/polars/lazy_frame.rb', line 1678 def reverse _from_rbldf(_ldf.reverse) end |
#schema ⇒ Hash
Get the schema.
220 221 222 |
# File 'lib/polars/lazy_frame.rb', line 220 def schema _ldf.schema end |
#select(exprs) ⇒ LazyFrame
Select columns from this DataFrame.
762 763 764 765 |
# File 'lib/polars/lazy_frame.rb', line 762 def select(exprs) exprs = Utils.selection_to_rbexpr_list(exprs) _from_rbldf(_ldf.select(exprs)) end |
#shift(periods) ⇒ LazyFrame
Shift the values by a given period.
1726 1727 1728 |
# File 'lib/polars/lazy_frame.rb', line 1726 def shift(periods) _from_rbldf(_ldf.shift(periods)) end |
#shift_and_fill(periods, fill_value) ⇒ LazyFrame
Shift the values by a given period and fill the resulting null values.
1776 1777 1778 1779 1780 1781 |
# File 'lib/polars/lazy_frame.rb', line 1776 def shift_and_fill(periods, fill_value) if !fill_value.is_a?(Expr) fill_value = Polars.lit(fill_value) end _from_rbldf(_ldf.shift_and_fill(periods, fill_value._rbexpr)) end |
#slice(offset, length = nil) ⇒ LazyFrame
Get a slice of this DataFrame.
1813 1814 1815 1816 1817 1818 |
# File 'lib/polars/lazy_frame.rb', line 1813 def slice(offset, length = nil) if length && length < 0 raise ArgumentError, "Negative slice lengths (#{length}) are invalid for LazyFrame" end _from_rbldf(_ldf.slice(offset, length)) end |
#sort(by, reverse: false, nulls_last: false) ⇒ LazyFrame
Sort the DataFrame.
Sorting can be done by:
- A single column name
- An expression
- Multiple expressions
385 386 387 388 389 390 391 392 393 394 395 |
# File 'lib/polars/lazy_frame.rb', line 385 def sort(by, reverse: false, nulls_last: false) if by.is_a?(String) _from_rbldf(_ldf.sort(by, reverse, nulls_last)) end if Utils.bool?(reverse) reverse = [reverse] end by = Utils.selection_to_rbexpr_list(by) _from_rbldf(_ldf.sort_by_exprs(by, reverse, nulls_last)) end |
#std(ddof: 1) ⇒ LazyFrame
Aggregate the columns in the DataFrame to their standard deviation value.
2012 2013 2014 |
# File 'lib/polars/lazy_frame.rb', line 2012 def std(ddof: 1) _from_rbldf(_ldf.std(ddof)) end |
#sum ⇒ LazyFrame
Aggregate the columns in the DataFrame to their sum value.
2104 2105 2106 |
# File 'lib/polars/lazy_frame.rb', line 2104 def sum _from_rbldf(_ldf.sum) end |
#tail(n = 5) ⇒ LazyFrame
Get the last n
rows.
1858 1859 1860 |
# File 'lib/polars/lazy_frame.rb', line 1858 def tail(n = 5) _from_rbldf(_ldf.tail(n)) end |
#take_every(n) ⇒ LazyFrame
Take every nth row in the LazyFrame and return as a new LazyFrame.
1932 1933 1934 |
# File 'lib/polars/lazy_frame.rb', line 1932 def take_every(n) select(Utils.col("*").take_every(n)) end |
#to_s ⇒ String
Returns a string representing the LazyFrame.
251 252 253 254 255 256 257 |
# File 'lib/polars/lazy_frame.rb', line 251 def to_s " naive plan: (run LazyFrame#describe_optimized_plan to see the optimized plan)\n\n \#{describe_plan}\n EOS\nend\n" |
#unique(maintain_order: true, subset: nil, keep: "first") ⇒ LazyFrame
Drop duplicate rows from this DataFrame.
Note that this fails if there is a column of type List
in the DataFrame or
subset.
2228 2229 2230 2231 2232 2233 |
# File 'lib/polars/lazy_frame.rb', line 2228 def unique(maintain_order: true, subset: nil, keep: "first") if !subset.nil? && !subset.is_a?(Array) subset = [subset] end _from_rbldf(_ldf.unique(maintain_order, subset, keep)) end |
#unnest(names) ⇒ LazyFrame
Decompose a struct into its fields.
The fields will be inserted into the DataFrame
on the location of the
struct
type.
2424 2425 2426 2427 2428 2429 |
# File 'lib/polars/lazy_frame.rb', line 2424 def unnest(names) if names.is_a?(String) names = [names] end _from_rbldf(_ldf.unnest(names)) end |
#var(ddof: 1) ⇒ LazyFrame
Aggregate the columns in the DataFrame to their variance value.
2044 2045 2046 |
# File 'lib/polars/lazy_frame.rb', line 2044 def var(ddof: 1) _from_rbldf(_ldf.var(ddof)) end |
#width ⇒ Integer
Get the width of the LazyFrame.
232 233 234 |
# File 'lib/polars/lazy_frame.rb', line 232 def width _ldf.width end |
#with_column(column) ⇒ LazyFrame
Add or overwrite column in a DataFrame.
1645 1646 1647 |
# File 'lib/polars/lazy_frame.rb', line 1645 def with_column(column) with_columns([column]) end |
#with_columns(exprs) ⇒ LazyFrame
Add or overwrite multiple columns in a DataFrame.
1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 |
# File 'lib/polars/lazy_frame.rb', line 1537 def with_columns(exprs) exprs = if exprs.nil? [] elsif exprs.is_a?(Expr) [exprs] else exprs.to_a end rbexprs = [] exprs.each do |e| case e when Expr rbexprs << e._rbexpr when Series rbexprs = Utils.lit(e)._rbexpr else raise ArgumentError, "Expected an expression, got #{e}" end end _from_rbldf(_ldf.with_columns(rbexprs)) end |
#with_context(other) ⇒ LazyFrame
Add an external context to the computation graph.
This allows expressions to also access columns from DataFrames that are not part of this one.
1593 1594 1595 1596 1597 1598 1599 |
# File 'lib/polars/lazy_frame.rb', line 1593 def with_context(other) if !other.is_a?(Array) other = [other] end _from_rbldf(_ldf.with_context(other.map(&:_ldf))) end |
#with_row_count(name: "row_nr", offset: 0) ⇒ LazyFrame
This can have a negative effect on query performance. This may, for instance, block predicate pushdown optimization.
Add a column at index 0 that counts the rows.
1910 1911 1912 |
# File 'lib/polars/lazy_frame.rb', line 1910 def with_row_count(name: "row_nr", offset: 0) _from_rbldf(_ldf.with_row_count(name, offset)) end |
#write_json(file) ⇒ nil
Write the logical plan of this LazyFrame to a file or string in JSON format.
265 266 267 268 269 270 271 |
# File 'lib/polars/lazy_frame.rb', line 265 def write_json(file) if file.is_a?(String) || (defined?(Pathname) && file.is_a?(Pathname)) file = Utils.format_path(file) end _ldf.write_json(file) nil end |