Class: Polars::StringExpr
- Inherits:
-
Object
- Object
- Polars::StringExpr
- Defined in:
- lib/polars/string_expr.rb
Overview
Namespace for string related expressions.
Instance Method Summary collapse
-
#concat(delimiter = "-") ⇒ Expr
Vertically concat the values in the Series to a single string value.
-
#contains(pattern, literal: false) ⇒ Expr
Check if string contains a substring that matches a regex.
-
#count_match(pattern) ⇒ Expr
Count all successive non-overlapping regex matches.
-
#decode(encoding, strict: false) ⇒ Expr
Decode a value using the provided encoding.
-
#encode(encoding) ⇒ Expr
Encode a value using the provided encoding.
-
#ends_with(sub) ⇒ Expr
Check if string values end with a substring.
-
#extract(pattern, group_index: 1) ⇒ Expr
Extract the target capture group from provided patterns.
-
#extract_all(pattern) ⇒ Expr
Extracts all matches for the given regex pattern.
-
#json_path_match(json_path) ⇒ Expr
Extract the first match of json string with provided JSONPath expression.
-
#lengths ⇒ Expr
Get length of the strings as
:u32
(as number of bytes). -
#ljust(width, fillchar = " ") ⇒ Expr
Return the string left justified in a string of length
width
. -
#lstrip(matches = nil) ⇒ Expr
Remove leading whitespace.
-
#n_chars ⇒ Expr
Get length of the strings as
:u32
(as number of chars). -
#replace(pattern, value, literal: false) ⇒ Expr
Replace first matching regex/literal substring with a new string value.
-
#replace_all(pattern, value, literal: false) ⇒ Expr
Replace all matching regex/literal substrings with a new string value.
-
#rjust(width, fillchar = " ") ⇒ Expr
Return the string right justified in a string of length
width
. -
#rstrip(matches = nil) ⇒ Expr
Remove trailing whitespace.
-
#slice(offset, length = nil) ⇒ Expr
Create subslices of the string values of a Utf8 Series.
-
#split(by, inclusive: false) ⇒ Expr
Split the string by a substring.
-
#split_exact(by, n, inclusive: false) ⇒ Expr
Split the string by a substring using
n
splits. -
#splitn(by, n) ⇒ Expr
Split the string by a substring, restricted to returning at most
n
items. -
#starts_with(sub) ⇒ Expr
Check if string values start with a substring.
-
#strip(matches = nil) ⇒ Expr
Remove leading and trailing whitespace.
-
#strptime(datatype, fmt = nil, strict: true, exact: true, cache: true, tz_aware: false) ⇒ Expr
Parse a Utf8 expression to a Date/Datetime/Time type.
-
#to_lowercase ⇒ Expr
Transform to lowercase variant.
-
#to_uppercase ⇒ Expr
Transform to uppercase variant.
-
#zfill(alignment) ⇒ Expr
Fills the string with zeroes.
Instance Method Details
#concat(delimiter = "-") ⇒ Expr
Vertically concat the values in the Series to a single string value.
179 180 181 |
# File 'lib/polars/string_expr.rb', line 179 def concat(delimiter = "-") Utils.wrap_expr(_rbexpr.str_concat(delimiter)) end |
#contains(pattern, literal: false) ⇒ Expr
Check if string contains a substring that matches a regex.
470 471 472 |
# File 'lib/polars/string_expr.rb', line 470 def contains(pattern, literal: false) Utils.wrap_expr(_rbexpr.str_contains(pattern, literal)) end |
#count_match(pattern) ⇒ Expr
Count all successive non-overlapping regex matches.
757 758 759 |
# File 'lib/polars/string_expr.rb', line 757 def count_match(pattern) Utils.wrap_expr(_rbexpr.count_match(pattern)) end |
#decode(encoding, strict: false) ⇒ Expr
Decode a value using the provided encoding.
623 624 625 626 627 628 629 630 631 |
# File 'lib/polars/string_expr.rb', line 623 def decode(encoding, strict: false) if encoding == "hex" Utils.wrap_expr(_rbexpr.str_hex_decode(strict)) elsif encoding == "base64" Utils.wrap_expr(_rbexpr.str_base64_decode(strict)) else raise ArgumentError, "encoding must be one of {{'hex', 'base64'}}, got #{encoding}" end end |
#encode(encoding) ⇒ Expr
Encode a value using the provided encoding.
656 657 658 659 660 661 662 663 664 |
# File 'lib/polars/string_expr.rb', line 656 def encode(encoding) if encoding == "hex" Utils.wrap_expr(_rbexpr.str_hex_encode) elsif encoding == "base64" Utils.wrap_expr(_rbexpr.str_base64_encode) else raise ArgumentError, "encoding must be one of {{'hex', 'base64'}}, got #{encoding}" end end |
#ends_with(sub) ⇒ Expr
Check if string values end with a substring.
511 512 513 |
# File 'lib/polars/string_expr.rb', line 511 def ends_with(sub) Utils.wrap_expr(_rbexpr.str_ends_with(sub)) end |
#extract(pattern, group_index: 1) ⇒ Expr
Extract the target capture group from provided patterns.
695 696 697 |
# File 'lib/polars/string_expr.rb', line 695 def extract(pattern, group_index: 1) Utils.wrap_expr(_rbexpr.str_extract(pattern, group_index)) end |
#extract_all(pattern) ⇒ Expr
Extracts all matches for the given regex pattern.
Extracts each successive non-overlapping regex match in an individual string as an array.
727 728 729 730 |
# File 'lib/polars/string_expr.rb', line 727 def extract_all(pattern) pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true) Utils.wrap_expr(_rbexpr.str_extract_all(pattern._rbexpr)) end |
#json_path_match(json_path) ⇒ Expr
Extract the first match of json string with provided JSONPath expression.
Throw errors if encounter invalid json strings. All return value will be casted to Utf8 regardless of the original value.
Documentation on JSONPath standard can be found here.
591 592 593 |
# File 'lib/polars/string_expr.rb', line 591 def json_path_match(json_path) Utils.wrap_expr(_rbexpr.str_json_path_match(json_path)) end |
#lengths ⇒ Expr
The returned lengths are equal to the number of bytes in the UTF8 string. If you
need the length in terms of the number of characters, use n_chars
instead.
Get length of the strings as :u32
(as number of bytes).
121 122 123 |
# File 'lib/polars/string_expr.rb', line 121 def lengths Utils.wrap_expr(_rbexpr.str_lengths) end |
#ljust(width, fillchar = " ") ⇒ Expr
Return the string left justified in a string of length width
.
Padding is done using the specified fillchar
.
The original string is returned if width
is less than or equal to
s.length
.
398 399 400 |
# File 'lib/polars/string_expr.rb', line 398 def ljust(width, fillchar = " ") Utils.wrap_expr(_rbexpr.str_ljust(width, fillchar)) end |
#lstrip(matches = nil) ⇒ Expr
Remove leading whitespace.
280 281 282 283 284 285 |
# File 'lib/polars/string_expr.rb', line 280 def lstrip(matches = nil) if !matches.nil? && matches.length > 1 raise ArgumentError, "matches should contain a single character" end Utils.wrap_expr(_rbexpr.str_lstrip(matches)) end |
#n_chars ⇒ Expr
If you know that you are working with ASCII text, lengths
will be
equivalent, and faster (returns length in terms of the number of bytes).
Get length of the strings as :u32
(as number of chars).
156 157 158 |
# File 'lib/polars/string_expr.rb', line 156 def n_chars Utils.wrap_expr(_rbexpr.str_n_chars) end |
#replace(pattern, value, literal: false) ⇒ Expr
Replace first matching regex/literal substring with a new string value.
901 902 903 904 905 |
# File 'lib/polars/string_expr.rb', line 901 def replace(pattern, value, literal: false) pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true) value = Utils.expr_to_lit_or_expr(value, str_to_lit: true) Utils.wrap_expr(_rbexpr.str_replace(pattern._rbexpr, value._rbexpr, literal)) end |
#replace_all(pattern, value, literal: false) ⇒ Expr
Replace all matching regex/literal substrings with a new string value.
932 933 934 935 936 |
# File 'lib/polars/string_expr.rb', line 932 def replace_all(pattern, value, literal: false) pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true) value = Utils.expr_to_lit_or_expr(value, str_to_lit: true) Utils.wrap_expr(_rbexpr.str_replace_all(pattern._rbexpr, value._rbexpr, literal)) end |
#rjust(width, fillchar = " ") ⇒ Expr
Return the string right justified in a string of length width
.
Padding is done using the specified fillchar
.
The original string is returned if width
is less than or equal to
s.length
.
433 434 435 |
# File 'lib/polars/string_expr.rb', line 433 def rjust(width, fillchar = " ") Utils.wrap_expr(_rbexpr.str_rjust(width, fillchar)) end |
#rstrip(matches = nil) ⇒ Expr
Remove trailing whitespace.
310 311 312 313 314 315 |
# File 'lib/polars/string_expr.rb', line 310 def rstrip(matches = nil) if !matches.nil? && matches.length > 1 raise ArgumentError, "matches should contain a single character" end Utils.wrap_expr(_rbexpr.str_rstrip(matches)) end |
#slice(offset, length = nil) ⇒ Expr
Create subslices of the string values of a Utf8 Series.
968 969 970 |
# File 'lib/polars/string_expr.rb', line 968 def slice(offset, length = nil) Utils.wrap_expr(_rbexpr.str_slice(offset, length)) end |
#split(by, inclusive: false) ⇒ Expr
Split the string by a substring.
786 787 788 789 790 791 792 |
# File 'lib/polars/string_expr.rb', line 786 def split(by, inclusive: false) if inclusive Utils.wrap_expr(_rbexpr.str_split_inclusive(by)) else Utils.wrap_expr(_rbexpr.str_split(by)) end end |
#split_exact(by, n, inclusive: false) ⇒ Expr
Split the string by a substring using n
splits.
Results in a struct of n+1
fields.
If it cannot make n
splits, the remaining field elements will be null.
831 832 833 834 835 836 837 |
# File 'lib/polars/string_expr.rb', line 831 def split_exact(by, n, inclusive: false) if inclusive Utils.wrap_expr(_rbexpr.str_split_exact_inclusive(by, n)) else Utils.wrap_expr(_rbexpr.str_split_exact(by, n)) end end |
#splitn(by, n) ⇒ Expr
Split the string by a substring, restricted to returning at most n
items.
If the number of possible splits is less than n-1
, the remaining field
elements will be null. If the number of possible splits is n-1
or greater,
the last (nth) substring will contain the remainder of the string.
870 871 872 |
# File 'lib/polars/string_expr.rb', line 870 def splitn(by, n) Utils.wrap_expr(_rbexpr.str_splitn(by, n)) end |
#starts_with(sub) ⇒ Expr
Check if string values start with a substring.
552 553 554 |
# File 'lib/polars/string_expr.rb', line 552 def starts_with(sub) Utils.wrap_expr(_rbexpr.str_starts_with(sub)) end |
#strip(matches = nil) ⇒ Expr
Remove leading and trailing whitespace.
250 251 252 253 254 255 |
# File 'lib/polars/string_expr.rb', line 250 def strip(matches = nil) if !matches.nil? && matches.length > 1 raise ArgumentError, "matches should contain a single character" end Utils.wrap_expr(_rbexpr.str_strip(matches)) end |
#strptime(datatype, fmt = nil, strict: true, exact: true, cache: true, tz_aware: false) ⇒ Expr
When parsing a Datetime the column precision will be inferred from the format string, if given, eg: "%F %T%.3f" => Datetime("ms"). If no fractional second component is found then the default is "us".
Parse a Utf8 expression to a Date/Datetime/Time type.
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
# File 'lib/polars/string_expr.rb', line 67 def strptime(datatype, fmt = nil, strict: true, exact: true, cache: true, tz_aware: false) if !Utils.is_polars_dtype(datatype) raise ArgumentError, "expected: {DataType} got: #{datatype}" end if datatype == :date Utils.wrap_expr(_rbexpr.str_parse_date(fmt, strict, exact, cache)) elsif datatype == :datetime # TODO fix tu = nil # datatype.tu dtcol = Utils.wrap_expr(_rbexpr.str_parse_datetime(fmt, strict, exact, cache, tz_aware)) if tu.nil? dtcol else dtcol.dt.cast_time_unit(tu) end elsif datatype == :time Utils.wrap_expr(_rbexpr.str_parse_time(fmt, strict, exact, cache)) else raise ArgumentError, "dtype should be of type :date, :datetime, or :time" end end |
#to_lowercase ⇒ Expr
Transform to lowercase variant.
223 224 225 |
# File 'lib/polars/string_expr.rb', line 223 def to_lowercase Utils.wrap_expr(_rbexpr.str_to_lowercase) end |
#to_uppercase ⇒ Expr
Transform to uppercase variant.
201 202 203 |
# File 'lib/polars/string_expr.rb', line 201 def to_uppercase Utils.wrap_expr(_rbexpr.str_to_uppercase) end |
#zfill(alignment) ⇒ Expr
Fills the string with zeroes.
Return a copy of the string left filled with ASCII '0' digits to make a string of length width.
A leading sign prefix ('+'/'-') is handled by inserting the padding after the
sign character rather than before. The original string is returned if width is
less than or equal to s.length
.
363 364 365 |
# File 'lib/polars/string_expr.rb', line 363 def zfill(alignment) Utils.wrap_expr(_rbexpr.str_zfill(alignment)) end |