Class: Polars::StringNameSpace
- Inherits:
-
Object
- Object
- Polars::StringNameSpace
- Defined in:
- lib/polars/string_name_space.rb
Overview
Series.str namespace.
Instance Method Summary collapse
-
#contains(pattern, literal: false) ⇒ Series
Check if strings in Series contain a substring that matches a regex.
-
#contains_any(patterns, ascii_case_insensitive: false) ⇒ Series
Use the Aho-Corasick algorithm to find matches.
-
#count_matches(pattern) ⇒ Series
(also: #count_match)
Count all successive non-overlapping regex matches.
-
#decode(encoding, strict: false) ⇒ Series
Decode a value using the provided encoding.
-
#encode(encoding) ⇒ Series
Encode a value using the provided encoding.
-
#ends_with(sub) ⇒ Series
Check if string values end with a substring.
-
#escape_regex ⇒ Series
Returns string values with all regular expression meta characters escaped.
-
#extract(pattern, group_index: 1) ⇒ Series
Extract the target capture group from provided patterns.
-
#extract_all(pattern) ⇒ Series
Extracts all matches for the given regex pattern.
-
#extract_groups(pattern) ⇒ Series
Extract all capture groups for the given regex pattern.
-
#extract_many(patterns, ascii_case_insensitive: false, overlapping: false) ⇒ Series
Use the Aho-Corasick algorithm to extract many matches.
-
#find(pattern, literal: false, strict: true) ⇒ Series
Return the bytes offset of the first substring matching a pattern.
-
#find_many(patterns, ascii_case_insensitive: false, overlapping: false) ⇒ Series
Use the Aho-Corasick algorithm to find all matches.
-
#head(n) ⇒ Series
Return the first n characters of each string in a String Series.
-
#join(delimiter = "-", ignore_nulls: true) ⇒ Series
(also: #concat)
Vertically concat the values in the Series to a single string value.
-
#json_decode(dtype = nil, infer_schema_length: 100) ⇒ Series
Parse string values as JSON.
-
#json_path_match(json_path) ⇒ Series
Extract the first match of json string with provided JSONPath expression.
-
#len_bytes ⇒ Series
(also: #lengths)
Return the length of each string as the number of bytes.
-
#len_chars ⇒ Series
(also: #n_chars)
Return the length of each string as the number of characters.
-
#ljust(width, fillchar = " ") ⇒ Series
Return the string left justified in a string of length
width. -
#normalize(form = "NFC") ⇒ Series
Returns the Unicode normal form of the string values.
-
#pad_end(length, fill_char = " ") ⇒ Series
Pad the end of the string until it reaches the given length.
-
#pad_start(length, fill_char = " ") ⇒ Series
Pad the start of the string until it reaches the given length.
-
#replace(pattern, value, literal: false) ⇒ Series
Replace first matching regex/literal substring with a new string value.
-
#replace_all(pattern, value, literal: false) ⇒ Series
Replace all matching regex/literal substrings with a new string value.
-
#replace_many(patterns, replace_with = Expr::NO_DEFAULT, ascii_case_insensitive: false) ⇒ Series
Use the Aho-Corasick algorithm to replace many matches.
-
#reverse ⇒ Series
Returns string values in reversed order.
-
#rjust(width, fillchar = " ") ⇒ Series
Return the string right justified in a string of length
width. -
#slice(offset, length = nil) ⇒ Series
Create subslices of the string values of a Utf8 Series.
-
#split(by, inclusive: false) ⇒ Series
Split the string by a substring.
-
#split_exact(by, n, inclusive: false) ⇒ Series
Split the string by a substring using
nsplits. -
#splitn(by, n) ⇒ Series
Split the string by a substring, restricted to returning at most
nitems. -
#starts_with(sub) ⇒ Series
Check if string values start with a substring.
-
#strip_chars(matches = nil) ⇒ Series
Remove leading and trailing whitespace.
-
#strip_chars_end(matches = nil) ⇒ Series
(also: #rstrip)
Remove trailing whitespace.
-
#strip_chars_start(matches = nil) ⇒ Series
(also: #lstrip)
Remove leading whitespace.
-
#strip_prefix(prefix) ⇒ Series
Remove prefix.
-
#strip_suffix(suffix) ⇒ Series
Remove suffix.
-
#strptime(datatype, fmt = nil, strict: true, exact: true, cache: true) ⇒ Series
Parse a Series of dtype Utf8 to a Date/Datetime Series.
-
#tail(n) ⇒ Series
Return the last n characters of each string in a String Series.
-
#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ Series
Convert a Utf8 column into a Date column.
-
#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, ambiguous: "raise") ⇒ Series
Convert a Utf8 column into a Datetime column.
-
#to_decimal(inference_length = 100) ⇒ Series
Convert a String column into a Decimal column.
-
#to_integer(base: 10, dtype: Int64, strict: true) ⇒ Series
Convert an String column into a column of dtype with base radix.
-
#to_lowercase ⇒ Series
Modify the strings to their lowercase equivalent.
-
#to_time(format = nil, strict: true, cache: true) ⇒ Series
Convert a Utf8 column into a Time column.
-
#to_uppercase ⇒ Series
Modify the strings to their uppercase equivalent.
-
#zfill(length) ⇒ Series
Fills the string with zeroes.
Dynamic Method Handling
This class handles dynamic methods through the method_missing method in the class Polars::ExprDispatch
Instance Method Details
#contains(pattern, literal: false) ⇒ Series
Check if strings in Series contain a substring that matches a regex.
292 293 294 |
# File 'lib/polars/string_name_space.rb', line 292 def contains(pattern, literal: false) super end |
#contains_any(patterns, ascii_case_insensitive: false) ⇒ Series
This method supports matching on string literals only, and does not support regular expression matching.
Use the Aho-Corasick algorithm to find matches.
Determines if any of the patterns are contained in the string.
1282 1283 1284 1285 1286 1287 |
# File 'lib/polars/string_name_space.rb', line 1282 def contains_any( patterns, ascii_case_insensitive: false ) super end |
#count_matches(pattern) ⇒ Series Also known as: count_match
Count all successive non-overlapping regex matches.
610 611 612 |
# File 'lib/polars/string_name_space.rb', line 610 def count_matches(pattern) super end |
#decode(encoding, strict: false) ⇒ Series
Decode a value using the provided encoding.
422 423 424 |
# File 'lib/polars/string_name_space.rb', line 422 def decode(encoding, strict: false) super end |
#encode(encoding) ⇒ Series
Encode a value using the provided encoding.
444 445 446 |
# File 'lib/polars/string_name_space.rb', line 444 def encode(encoding) super end |
#ends_with(sub) ⇒ Series
Check if string values end with a substring.
373 374 375 |
# File 'lib/polars/string_name_space.rb', line 373 def ends_with(sub) super end |
#escape_regex ⇒ Series
Returns string values with all regular expression meta characters escaped.
1528 1529 1530 |
# File 'lib/polars/string_name_space.rb', line 1528 def escape_regex super end |
#extract(pattern, group_index: 1) ⇒ Series
Extract the target capture group from provided patterns.
532 533 534 |
# File 'lib/polars/string_name_space.rb', line 532 def extract(pattern, group_index: 1) super end |
#extract_all(pattern) ⇒ Series
Extracts all matches for the given regex pattern.
Extract each successive non-overlapping regex match in an individual string as an array
556 557 558 |
# File 'lib/polars/string_name_space.rb', line 556 def extract_all(pattern) super end |
#extract_groups(pattern) ⇒ Series
All group names are strings.
Extract all capture groups for the given regex pattern.
589 590 591 |
# File 'lib/polars/string_name_space.rb', line 589 def extract_groups(pattern) super end |
#extract_many(patterns, ascii_case_insensitive: false, overlapping: false) ⇒ Series
This method supports matching on string literals only, and does not support regular expression matching.
Use the Aho-Corasick algorithm to extract many matches.
1402 1403 1404 1405 1406 1407 1408 |
# File 'lib/polars/string_name_space.rb', line 1402 def extract_many( patterns, ascii_case_insensitive: false, overlapping: false ) super end |
#find(pattern, literal: false, strict: true) ⇒ Series
To modify regular expression behaviour (such as case-sensitivity) with
flags, use the inline (?iLmsuxU) syntax.
Return the bytes offset of the first substring matching a pattern.
If the pattern is not found, returns nil.
351 352 353 |
# File 'lib/polars/string_name_space.rb', line 351 def find(pattern, literal: false, strict: true) super end |
#find_many(patterns, ascii_case_insensitive: false, overlapping: false) ⇒ Series
This method supports matching on string literals only, and does not support regular expression matching.
Use the Aho-Corasick algorithm to find all matches.
The function returns the byte offset of the start of each match.
The return type will be List<UInt32>
1472 1473 1474 1475 1476 1477 1478 |
# File 'lib/polars/string_name_space.rb', line 1472 def find_many( patterns, ascii_case_insensitive: false, overlapping: false ) super end |
#head(n) ⇒ Series
Return the first n characters of each string in a String Series.
1158 1159 1160 |
# File 'lib/polars/string_name_space.rb', line 1158 def head(n) super end |
#join(delimiter = "-", ignore_nulls: true) ⇒ Series Also known as: concat
Vertically concat the values in the Series to a single string value.
1508 1509 1510 |
# File 'lib/polars/string_name_space.rb', line 1508 def join(delimiter = "-", ignore_nulls: true) super end |
#json_decode(dtype = nil, infer_schema_length: 100) ⇒ Series
Parse string values as JSON.
Throws an error if invalid JSON strings are encountered.
472 473 474 |
# File 'lib/polars/string_name_space.rb', line 472 def json_decode(dtype = nil, infer_schema_length: 100) super end |
#json_path_match(json_path) ⇒ Series
Extract the first match of json string with provided JSONPath expression.
Throw errors if encounter invalid json strings. All return value will be casted to Utf8 regardless of the original value.
Documentation on JSONPath standard can be found here.
504 505 506 |
# File 'lib/polars/string_name_space.rb', line 504 def json_path_match(json_path) super end |
#len_bytes ⇒ Series Also known as: lengths
Return the length of each string as the number of bytes.
233 234 235 |
# File 'lib/polars/string_name_space.rb', line 233 def len_bytes super end |
#len_chars ⇒ Series Also known as: n_chars
Return the length of each string as the number of characters.
254 255 256 |
# File 'lib/polars/string_name_space.rb', line 254 def len_chars super end |
#ljust(width, fillchar = " ") ⇒ Series
Return the string left justified in a string of length width.
Padding is done using the specified fillchar. The original string is
returned if width is less than or equal to s.length.
999 1000 1001 |
# File 'lib/polars/string_name_space.rb', line 999 def ljust(width, fillchar = " ") super end |
#normalize(form = "NFC") ⇒ Series
Returns the Unicode normal form of the string values.
This uses the forms described in Unicode Standard Annex 15: https://www.unicode.org/reports/tr15/.
1561 1562 1563 |
# File 'lib/polars/string_name_space.rb', line 1561 def normalize(form = "NFC") super end |
#pad_end(length, fill_char = " ") ⇒ Series
Pad the end of the string until it reaches the given length.
941 942 943 |
# File 'lib/polars/string_name_space.rb', line 941 def pad_end(length, fill_char = " ") super end |
#pad_start(length, fill_char = " ") ⇒ Series
Pad the start of the string until it reaches the given length.
915 916 917 |
# File 'lib/polars/string_name_space.rb', line 915 def pad_start(length, fill_char = " ") super end |
#replace(pattern, value, literal: false) ⇒ Series
Replace first matching regex/literal substring with a new string value.
749 750 751 |
# File 'lib/polars/string_name_space.rb', line 749 def replace(pattern, value, literal: false) super end |
#replace_all(pattern, value, literal: false) ⇒ Series
Replace all matching regex/literal substrings with a new string value.
774 775 776 |
# File 'lib/polars/string_name_space.rb', line 774 def replace_all(pattern, value, literal: false) super end |
#replace_many(patterns, replace_with = Expr::NO_DEFAULT, ascii_case_insensitive: false) ⇒ Series
This method supports matching on string literals only, and does not support regular expression matching.
Use the Aho-Corasick algorithm to replace many matches.
1367 1368 1369 1370 1371 1372 1373 |
# File 'lib/polars/string_name_space.rb', line 1367 def replace_many( patterns, replace_with = Expr::NO_DEFAULT, ascii_case_insensitive: false ) super end |
#reverse ⇒ Series
Returns string values in reversed order.
1082 1083 1084 |
# File 'lib/polars/string_name_space.rb', line 1082 def reverse super end |
#rjust(width, fillchar = " ") ⇒ Series
Return the string right justified in a string of length width.
Padding is done using the specified fillchar. The original string is
returned if width is less than or equal to s.length.
1027 1028 1029 |
# File 'lib/polars/string_name_space.rb', line 1027 def rjust(width, fillchar = " ") super end |
#slice(offset, length = nil) ⇒ Series
Create subslices of the string values of a Utf8 Series.
1120 1121 1122 1123 |
# File 'lib/polars/string_name_space.rb', line 1120 def slice(offset, length = nil) s = Utils.wrap_s(_s) s.to_frame.select(Polars.col(s.name).str.slice(offset, length)).to_series end |
#split(by, inclusive: false) ⇒ Series
Split the string by a substring.
623 624 625 |
# File 'lib/polars/string_name_space.rb', line 623 def split(by, inclusive: false) super end |
#split_exact(by, n, inclusive: false) ⇒ Series
Split the string by a substring using n splits.
Results in a struct of n+1 fields.
If it cannot make n splits, the remaining field elements will be null.
674 675 676 |
# File 'lib/polars/string_name_space.rb', line 674 def split_exact(by, n, inclusive: false) super end |
#splitn(by, n) ⇒ Series
Split the string by a substring, restricted to returning at most n items.
If the number of possible splits is less than n-1, the remaining field
elements will be null. If the number of possible splits is n-1 or greater,
the last (nth) substring will contain the remainder of the string.
723 724 725 726 |
# File 'lib/polars/string_name_space.rb', line 723 def splitn(by, n) s = Utils.wrap_s(_s) s.to_frame.select(Polars.col(s.name).str.splitn(by, n)).to_series end |
#starts_with(sub) ⇒ Series
Check if string values start with a substring.
395 396 397 |
# File 'lib/polars/string_name_space.rb', line 395 def starts_with(sub) super end |
#strip_chars(matches = nil) ⇒ Series
Remove leading and trailing whitespace.
795 796 797 |
# File 'lib/polars/string_name_space.rb', line 795 def strip_chars(matches = nil) super end |
#strip_chars_end(matches = nil) ⇒ Series Also known as: rstrip
Remove trailing whitespace.
838 839 840 |
# File 'lib/polars/string_name_space.rb', line 838 def strip_chars_end(matches = nil) super end |
#strip_chars_start(matches = nil) ⇒ Series Also known as: lstrip
Remove leading whitespace.
816 817 818 |
# File 'lib/polars/string_name_space.rb', line 816 def strip_chars_start(matches = nil) super end |
#strip_prefix(prefix) ⇒ Series
Remove prefix.
The prefix will be removed from the string exactly once, if found.
864 865 866 |
# File 'lib/polars/string_name_space.rb', line 864 def strip_prefix(prefix) super end |
#strip_suffix(suffix) ⇒ Series
Remove suffix.
The suffix will be removed from the string exactly once, if found.
889 890 891 |
# File 'lib/polars/string_name_space.rb', line 889 def strip_suffix(suffix) super end |
#strptime(datatype, fmt = nil, strict: true, exact: true, cache: true) ⇒ Series
Parse a Series of dtype Utf8 to a Date/Datetime Series.
183 184 185 |
# File 'lib/polars/string_name_space.rb', line 183 def strptime(datatype, fmt = nil, strict: true, exact: true, cache: true) super end |
#tail(n) ⇒ Series
Return the last n characters of each string in a String Series.
1195 1196 1197 |
# File 'lib/polars/string_name_space.rb', line 1195 def tail(n) super end |
#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ Series
Convert a Utf8 column into a Date column.
41 42 43 |
# File 'lib/polars/string_name_space.rb', line 41 def to_date(format = nil, strict: true, exact: true, cache: true) super end |
#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, ambiguous: "raise") ⇒ Series
Convert a Utf8 column into a Datetime column.
86 87 88 89 90 91 92 93 94 95 96 |
# File 'lib/polars/string_name_space.rb', line 86 def to_datetime( format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, ambiguous: "raise" ) super end |
#to_decimal(inference_length = 100) ⇒ Series
Convert a String column into a Decimal column.
This method infers the needed parameters precision and scale.
213 214 215 |
# File 'lib/polars/string_name_space.rb', line 213 def to_decimal(inference_length = 100) super end |
#to_integer(base: 10, dtype: Int64, strict: true) ⇒ Series
Convert an String column into a column of dtype with base radix.
1239 1240 1241 1242 1243 1244 1245 |
# File 'lib/polars/string_name_space.rb', line 1239 def to_integer( base: 10, dtype: Int64, strict: true ) super end |
#to_lowercase ⇒ Series
Modify the strings to their lowercase equivalent.
1045 1046 1047 |
# File 'lib/polars/string_name_space.rb', line 1045 def to_lowercase super end |
#to_time(format = nil, strict: true, cache: true) ⇒ Series
Convert a Utf8 column into a Time column.
123 124 125 |
# File 'lib/polars/string_name_space.rb', line 123 def to_time(format = nil, strict: true, cache: true) super end |
#to_uppercase ⇒ Series
Modify the strings to their uppercase equivalent.
1063 1064 1065 |
# File 'lib/polars/string_name_space.rb', line 1063 def to_uppercase super end |
#zfill(length) ⇒ Series
Fills the string with zeroes.
Return a copy of the string left filled with ASCII '0' digits to make a string of length width.
A leading sign prefix ('+'/'-') is handled by inserting the padding after the
sign character rather than before. The original string is returned if width is
less than or equal to s.length.
971 972 973 |
# File 'lib/polars/string_name_space.rb', line 971 def zfill(length) super end |