Module: Polars::Selectors
- Defined in:
- lib/polars/selectors.rb
Class Method Summary collapse
-
.all ⇒ Selector
Select all columns.
-
.alpha(ascii_only: false, ignore_spaces: false) ⇒ Selector
Select all columns with alphabetic names (eg: only letters).
-
.alphanumeric(ascii_only: false, ignore_spaces: false) ⇒ Selector
Select all columns with alphanumeric names (eg: only letters and the digits 0-9).
-
.array(inner = nil, width: nil) ⇒ Selector
Select all array columns.
-
.binary ⇒ Selector
Select all binary columns.
-
.boolean ⇒ Selector
Select all boolean columns.
-
.by_dtype(*dtypes) ⇒ Selector
Select all columns matching the given dtypes.
-
.by_index(*indices, require_all: true) ⇒ Selector
Select all columns matching the given indices (or range objects).
-
.by_name(*names, require_all: true) ⇒ Selector
Select all columns matching the given names.
-
.categorical ⇒ Selector
Select all categorical columns.
-
.contains(*substring) ⇒ Selector
Select columns whose names contain the given literal substring(s).
-
.date ⇒ Selector
Select all date columns.
-
.datetime ⇒ Selector
Select all datetime columns, optionally filtering by time unit/zone.
-
.decimal ⇒ Selector
Select all decimal columns.
-
.digit(ascii_only: false) ⇒ Selector
Select all columns having names consisting only of digits.
-
.duration ⇒ Selector
Select all duration columns, optionally filtering by time unit.
-
.empty ⇒ Selector
Select no columns.
-
.ends_with(*suffix) ⇒ Selector
Select columns that end with the given substring(s).
-
.enum ⇒ Selector
Select all enum columns.
-
.exclude(columns, *more_columns) ⇒ Selector
Select all columns except those matching the given columns, datatypes, or selectors.
-
.first(strict: true) ⇒ Selector
Select the first column in the current scope.
-
.float ⇒ Selector
Select all float columns.
-
.integer ⇒ Selector
Select all integer columns.
-
.last(strict: true) ⇒ Selector
Select the last column in the current scope.
-
.list(inner = nil) ⇒ Selector
Select all list columns.
-
.matches(pattern) ⇒ Selector
Select all columns that match the given regex pattern.
-
.nested ⇒ Selector
Select all nested columns.
-
.numeric ⇒ Selector
Select all numeric columns.
-
.object ⇒ Selector
Select all object columns.
-
.signed_integer ⇒ Selector
Select all signed integer columns.
-
.starts_with(*prefix) ⇒ Selector
Select columns that start with the given substring(s).
-
.string(include_categorical: false) ⇒ Selector
Select all String (and, optionally, Categorical) string columns.
-
.struct ⇒ Selector
Select all struct columns.
-
.temporal ⇒ Selector
Select all temporal columns.
-
.time ⇒ Selector
Select all time columns.
-
.unsigned_integer ⇒ Selector
Select all unsigned integer columns.
Class Method Details
.all ⇒ Selector
Select all columns.
76 77 78 |
# File 'lib/polars/selectors.rb', line 76 def self.all Selector._from_rbselector(RbSelector.all) end |
.alpha(ascii_only: false, ignore_spaces: false) ⇒ Selector
Matching column names cannot contain any non-alphabetic characters. Note
that the definition of "alphabetic" consists of all valid Unicode alphabetic
characters (\p{Alphabetic}
) by default; this can be changed by setting
ascii_only: true
.
Select all columns with alphabetic names (eg: only letters).
177 178 179 180 181 182 |
# File 'lib/polars/selectors.rb', line 177 def self.alpha(ascii_only: false, ignore_spaces: false) # note that we need to supply a pattern compatible with the *rust* regex crate re_alpha = ascii_only ? "a-zA-Z" : "\\p{Alphabetic}" re_space = ignore_spaces ? " " : "" Selector._from_rbselector(RbSelector.matches("^[#{re_alpha}#{re_space}]+$")) end |
.alphanumeric(ascii_only: false, ignore_spaces: false) ⇒ Selector
Matching column names cannot contain any non-alphabetic or integer characters.
Note that the definition of "alphabetic" consists of all valid Unicode alphabetic
characters (\p{Alphabetic}
) and digit characters (\d
) by default; this
can be changed by setting ascii_only: true
.
Select all columns with alphanumeric names (eg: only letters and the digits 0-9).
264 265 266 267 268 269 270 271 272 |
# File 'lib/polars/selectors.rb', line 264 def self.alphanumeric(ascii_only: false, ignore_spaces: false) # note that we need to supply patterns compatible with the *rust* regex crate re_alpha = ascii_only ? "a-zA-Z" : "\\p{Alphabetic}" re_digit = ascii_only ? "0-9" : "\\d" re_space = ignore_spaces ? " " : "" return Selector._from_rbselector( RbSelector.matches("^[#{re_alpha}#{re_digit}#{re_space}]+$") ) end |
.array(inner = nil, width: nil) ⇒ Selector
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
Select all array columns.
795 796 797 798 |
# File 'lib/polars/selectors.rb', line 795 def self.array(inner = nil, width: nil) inner_s = !inner.nil? ? inner._rbselector : nil Selector._from_rbselector(RbSelector.array(inner_s, width)) end |
.binary ⇒ Selector
Select all binary columns.
297 298 299 |
# File 'lib/polars/selectors.rb', line 297 def self.binary by_dtype([Binary]) end |
.boolean ⇒ Selector
Select all boolean columns.
349 350 351 |
# File 'lib/polars/selectors.rb', line 349 def self.boolean by_dtype([Boolean]) end |
.by_dtype(*dtypes) ⇒ Selector
Select all columns matching the given dtypes.
Group by string columns and sum the numeric columns: df.group_by(Polars.cs.string).agg(Polars.cs.numeric.sum).sort("other") # => # shape: (2, 2) # ┌───────┬──────────┐ # │ other ┆ value │ # │ --- ┆ --- │ # │ str ┆ i64 │ # ╞═══════╪══════════╡ # │ bar ┆ 5000555 │ # │ foo ┆ -3265500 │ # └───────┴──────────┘
404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 |
# File 'lib/polars/selectors.rb', line 404 def self.by_dtype(*dtypes) all_dtypes = [] dtypes.each do |tp| if Utils.is_polars_dtype(tp) || tp.is_a?(Class) all_dtypes << tp elsif tp.is_a?(::Array) tp.each do |t| if !(Utils.is_polars_dtype(t) || t.is_a?(Class)) msg = "invalid dtype: #{t.inspect}" raise TypeError, msg end all_dtypes << t end else msg = "invalid dtype: #{tp.inspect}" raise TypeError, msg end end Selector._by_dtype(all_dtypes) end |
.by_index(*indices, require_all: true) ⇒ Selector
Matching columns are returned in the order in which their indexes appear in the selector, not the underlying schema order.
Select all columns matching the given indices (or range objects).
502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 |
# File 'lib/polars/selectors.rb', line 502 def self.by_index(*indices, require_all: true) all_indices = [] indices.each do |idx| if idx.is_a?(Enumerable) all_indices.concat(idx.to_a) elsif idx.is_a?(Integer) all_indices << idx else msg = "invalid index value: #{idx.inspect}" raise TypeError, msg end end Selector._from_rbselector(RbSelector.by_index(all_indices, require_all)) end |
.by_name(*names, require_all: true) ⇒ Selector
Matching columns are returned in the order in which they are declared in the selector, not the underlying schema order.
Select all columns matching the given names.
579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 |
# File 'lib/polars/selectors.rb', line 579 def self.by_name(*names, require_all: true) all_names = [] names.each do |nm| if nm.is_a?(::String) all_names << nm elsif nm.is_a?(::Array) nm.each do |n| if !n.is_a?(::String) msg = "invalid name: #{n.inspect}" raise TypeError, msg end all_names << n end else msg = "invalid name: #{nm.inspect}" raise TypeError, msg end end Selector._by_name(all_names, strict: require_all) end |
.categorical ⇒ Selector
Select all categorical columns.
930 931 932 |
# File 'lib/polars/selectors.rb', line 930 def self.categorical Selector._from_rbselector(RbSelector.categorical) end |
.contains(*substring) ⇒ Selector
Select columns whose names contain the given literal substring(s).
989 990 991 992 993 994 |
# File 'lib/polars/selectors.rb', line 989 def self.contains(*substring) escaped_substring = _re_string(substring) raw_params = "^.*#{escaped_substring}.*$" Selector._from_rbselector(RbSelector.matches(raw_params)) end |
.date ⇒ Selector
Select all date columns.
1033 1034 1035 |
# File 'lib/polars/selectors.rb', line 1033 def self.date by_dtype([Date]) end |
.datetime ⇒ Selector
Select all datetime columns, optionally filtering by time unit/zone.
1040 1041 1042 1043 1044 1045 1046 |
# File 'lib/polars/selectors.rb', line 1040 def self.datetime time_unit = ["ms", "us", "ns"] time_zone = [nil] Selector._from_rbselector(RbSelector.datetime(time_unit, time_zone)) end |
.decimal ⇒ Selector
Select all decimal columns.
1088 1089 1090 1091 |
# File 'lib/polars/selectors.rb', line 1088 def self.decimal # TODO: allow explicit selection by scale/precision? Selector._from_rbselector(RbSelector.decimal) end |
.digit(ascii_only: false) ⇒ Selector
Matching column names cannot contain any non-digit characters. Note that the
definition of "digit" consists of all valid Unicode digit characters (\d
)
by default; this can be changed by setting ascii_only: true
.
Select all columns having names consisting only of digits.
1176 1177 1178 1179 |
# File 'lib/polars/selectors.rb', line 1176 def self.digit(ascii_only: false) re_digit = ascii_only ? "[0-9]" : "\\d" Selector._from_rbselector(RbSelector.matches("^#{re_digit}+$")) end |
.duration ⇒ Selector
Select all duration columns, optionally filtering by time unit.
1184 1185 1186 1187 1188 |
# File 'lib/polars/selectors.rb', line 1184 def self.duration time_unit = ["ms", "us", "ns"] Selector._from_rbselector(RbSelector.duration(time_unit)) end |
.empty ⇒ Selector
Select no columns.
This is useful for composition with other selectors.
34 35 36 |
# File 'lib/polars/selectors.rb', line 34 def self.empty Selector._from_rbselector(RbSelector.empty) end |
.ends_with(*suffix) ⇒ Selector
Select columns that end with the given substring(s).
1245 1246 1247 1248 1249 1250 |
# File 'lib/polars/selectors.rb', line 1245 def self.ends_with(*suffix) escaped_suffix = _re_string(suffix) raw_params = "^.*#{escaped_suffix}$" Selector._from_rbselector(RbSelector.matches(raw_params)) end |
.enum ⇒ Selector
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
Select all enum columns.
642 643 644 |
# File 'lib/polars/selectors.rb', line 642 def self.enum Selector._from_rbselector(RbSelector.enum_) end |
.exclude(columns, *more_columns) ⇒ Selector
If excluding a single selector it is simpler to write as ~selector
instead.
Select all columns except those matching the given columns, datatypes, or selectors.
1300 1301 1302 |
# File 'lib/polars/selectors.rb', line 1300 def self.exclude(columns, *more_columns) ~_combine_as_selector(columns, *more_columns) end |
.first(strict: true) ⇒ Selector
Select the first column in the current scope.
1343 1344 1345 |
# File 'lib/polars/selectors.rb', line 1343 def self.first(strict: true) Selector._from_rbselector(RbSelector.first(strict)) end |
.float ⇒ Selector
Select all float columns.
1387 1388 1389 |
# File 'lib/polars/selectors.rb', line 1387 def self.float Selector._from_rbselector(RbSelector.float) end |
.integer ⇒ Selector
Select all integer columns.
1430 1431 1432 |
# File 'lib/polars/selectors.rb', line 1430 def self.integer Selector._from_rbselector(RbSelector.integer) end |
.last(strict: true) ⇒ Selector
Select the last column in the current scope.
1587 1588 1589 |
# File 'lib/polars/selectors.rb', line 1587 def self.last(strict: true) Selector._from_rbselector(RbSelector.last(strict)) end |
.list(inner = nil) ⇒ Selector
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
Select all list columns.
707 708 709 710 |
# File 'lib/polars/selectors.rb', line 707 def self.list(inner = nil) inner_s = !inner.nil? ? inner._rbselector : nil Selector._from_rbselector(RbSelector.list(inner_s)) end |
.matches(pattern) ⇒ Selector
Select all columns that match the given regex pattern.
1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 |
# File 'lib/polars/selectors.rb', line 1631 def self.matches(pattern) if pattern == ".*" all else if pattern.start_with?(".*") pattern = pattern[2..] elsif pattern.end_with?(".*") pattern = pattern[..-3] end pfx = !pattern.start_with?("^") ? "^.*" : "" sfx = !pattern.end_with?("$") ? ".*$" : "" raw_params = "#{pfx}#{pattern}#{sfx}" Selector._from_rbselector(RbSelector.matches(raw_params)) end end |
.nested ⇒ Selector
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
Select all nested columns.
A nested column is a list, array or struct.
887 888 889 |
# File 'lib/polars/selectors.rb', line 887 def self.nested Selector._from_rbselector(RbSelector.nested) end |
.numeric ⇒ Selector
Select all numeric columns.
1689 1690 1691 |
# File 'lib/polars/selectors.rb', line 1689 def self.numeric Selector._from_rbselector(RbSelector.numeric) end |
.object ⇒ Selector
Select all object columns.
1696 1697 1698 |
# File 'lib/polars/selectors.rb', line 1696 def self.object Selector._from_rbselector(RbSelector.object) end |
.signed_integer ⇒ Selector
Select all signed integer columns.
1487 1488 1489 |
# File 'lib/polars/selectors.rb', line 1487 def self.signed_integer Selector._from_rbselector(RbSelector.signed_integer) end |
.starts_with(*prefix) ⇒ Selector
Select columns that start with the given substring(s).
1755 1756 1757 1758 1759 1760 |
# File 'lib/polars/selectors.rb', line 1755 def self.starts_with(*prefix) escaped_prefix = _re_string(prefix) raw_params = "^#{escaped_prefix}.*$" Selector._from_rbselector(RbSelector.matches(raw_params)) end |
.string(include_categorical: false) ⇒ Selector
Select all String (and, optionally, Categorical) string columns.
df.group_by(Polars.cs.string).agg(Polars.cs.numeric.sum).sort(Polars.cs.string) shape: (2, 3) ┌─────┬─────┬─────┐ │ w ┆ x ┆ y │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ f64 │ ╞═════╪═════╪═════╡ │ xx ┆ 0 ┆ 2.0 │ │ yy ┆ 6 ┆ 7.0 │ └─────┴─────┴─────┘
1805 1806 1807 1808 1809 1810 1811 1812 |
# File 'lib/polars/selectors.rb', line 1805 def self.string(include_categorical: false) string_dtypes = [String] if include_categorical string_dtypes << Categorical end by_dtype(string_dtypes) end |
.struct ⇒ Selector
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
Select all struct columns.
840 841 842 |
# File 'lib/polars/selectors.rb', line 840 def self.struct Selector._from_rbselector(RbSelector.struct_) end |
.temporal ⇒ Selector
Select all temporal columns.
1864 1865 1866 |
# File 'lib/polars/selectors.rb', line 1864 def self.temporal Selector._from_rbselector(RbSelector.temporal) end |