Class: Polars::StringNameSpace
- Inherits:
-
Object
- Object
- Polars::StringNameSpace
- Defined in:
- lib/polars/string_name_space.rb
Overview
Series.str namespace.
Instance Method Summary collapse
-
#contains(pattern, literal: false) ⇒ Series
Check if strings in Series contain a substring that matches a regex.
-
#contains_any(patterns, ascii_case_insensitive: false) ⇒ Series
Use the Aho-Corasick algorithm to find matches.
-
#count_matches(pattern) ⇒ Series
(also: #count_match)
Count all successive non-overlapping regex matches.
-
#decode(encoding, strict: false) ⇒ Series
Decode a value using the provided encoding.
-
#encode(encoding) ⇒ Series
Encode a value using the provided encoding.
-
#ends_with(sub) ⇒ Series
Check if string values end with a substring.
-
#escape_regex ⇒ Series
Returns string values with all regular expression meta characters escaped.
-
#extract(pattern, group_index: 1) ⇒ Series
Extract the target capture group from provided patterns.
-
#extract_all(pattern) ⇒ Series
Extracts all matches for the given regex pattern.
-
#extract_groups(pattern) ⇒ Series
Extract all capture groups for the given regex pattern.
-
#extract_many(patterns, ascii_case_insensitive: false, overlapping: false) ⇒ Series
Use the Aho-Corasick algorithm to extract many matches.
-
#find(pattern, literal: false, strict: true) ⇒ Series
Return the bytes offset of the first substring matching a pattern.
-
#find_many(patterns, ascii_case_insensitive: false, overlapping: false) ⇒ Series
Use the Aho-Corasick algorithm to find all matches.
-
#head(n) ⇒ Series
Return the first n characters of each string in a String Series.
-
#join(delimiter = "-", ignore_nulls: true) ⇒ Series
(also: #concat)
Vertically concat the values in the Series to a single string value.
-
#json_decode(dtype = nil, infer_schema_length: 100) ⇒ Series
Parse string values as JSON.
-
#json_path_match(json_path) ⇒ Series
Extract the first match of json string with provided JSONPath expression.
-
#len_bytes ⇒ Series
(also: #lengths)
Return the length of each string as the number of bytes.
-
#len_chars ⇒ Series
(also: #n_chars)
Return the length of each string as the number of characters.
-
#ljust(width, fillchar = " ") ⇒ Series
Return the string left justified in a string of length
width. -
#normalize(form = "NFC") ⇒ Series
Returns the Unicode normal form of the string values.
-
#pad_end(length, fill_char = " ") ⇒ Series
Pad the end of the string until it reaches the given length.
-
#pad_start(length, fill_char = " ") ⇒ Series
Pad the start of the string until it reaches the given length.
-
#replace(pattern, value, literal: false) ⇒ Series
Replace first matching regex/literal substring with a new string value.
-
#replace_all(pattern, value, literal: false) ⇒ Series
Replace all matching regex/literal substrings with a new string value.
-
#replace_many(patterns, replace_with = Expr::NO_DEFAULT, ascii_case_insensitive: false) ⇒ Series
Use the Aho-Corasick algorithm to replace many matches.
-
#reverse ⇒ Series
Returns string values in reversed order.
-
#rjust(width, fillchar = " ") ⇒ Series
Return the string right justified in a string of length
width. -
#slice(offset, length = nil) ⇒ Series
Create subslices of the string values of a Utf8 Series.
-
#split(by, inclusive: false) ⇒ Series
Split the string by a substring.
-
#split_exact(by, n, inclusive: false) ⇒ Series
Split the string by a substring using
nsplits. -
#splitn(by, n) ⇒ Series
Split the string by a substring, restricted to returning at most
nitems. -
#starts_with(sub) ⇒ Series
Check if string values start with a substring.
-
#strip_chars(matches = nil) ⇒ Series
Remove leading and trailing whitespace.
-
#strip_chars_end(matches = nil) ⇒ Series
(also: #rstrip)
Remove trailing whitespace.
-
#strip_chars_start(matches = nil) ⇒ Series
(also: #lstrip)
Remove leading whitespace.
-
#strip_prefix(prefix) ⇒ Series
Remove prefix.
-
#strip_suffix(suffix) ⇒ Series
Remove suffix.
-
#strptime(datatype, fmt = nil, strict: true, exact: true, cache: true) ⇒ Series
Parse a Series of dtype Utf8 to a Date/Datetime Series.
-
#tail(n) ⇒ Series
Return the last n characters of each string in a String Series.
-
#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ Series
Convert a Utf8 column into a Date column.
-
#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, ambiguous: "raise") ⇒ Series
Convert a Utf8 column into a Datetime column.
-
#to_decimal(inference_length = 100, scale: nil) ⇒ Series
Convert a String column into a Decimal column.
-
#to_integer(base: 10, dtype: Int64, strict: true) ⇒ Series
Convert an String column into a column of dtype with base radix.
-
#to_lowercase ⇒ Series
Modify the strings to their lowercase equivalent.
-
#to_time(format = nil, strict: true, cache: true) ⇒ Series
Convert a Utf8 column into a Time column.
-
#to_uppercase ⇒ Series
Modify the strings to their uppercase equivalent.
-
#zfill(length) ⇒ Series
Fills the string with zeroes.
Dynamic Method Handling
This class handles dynamic methods through the method_missing method in the class Polars::ExprDispatch
Instance Method Details
#contains(pattern, literal: false) ⇒ Series
Check if strings in Series contain a substring that matches a regex.
296 297 298 |
# File 'lib/polars/string_name_space.rb', line 296 def contains(pattern, literal: false) super end |
#contains_any(patterns, ascii_case_insensitive: false) ⇒ Series
This method supports matching on string literals only, and does not support regular expression matching.
Use the Aho-Corasick algorithm to find matches.
Determines if any of the patterns are contained in the string.
1295 1296 1297 1298 1299 1300 |
# File 'lib/polars/string_name_space.rb', line 1295 def contains_any( patterns, ascii_case_insensitive: false ) super end |
#count_matches(pattern) ⇒ Series Also known as: count_match
Count all successive non-overlapping regex matches.
623 624 625 |
# File 'lib/polars/string_name_space.rb', line 623 def count_matches(pattern) super end |
#decode(encoding, strict: false) ⇒ Series
Decode a value using the provided encoding.
426 427 428 |
# File 'lib/polars/string_name_space.rb', line 426 def decode(encoding, strict: false) super end |
#encode(encoding) ⇒ Series
Encode a value using the provided encoding.
448 449 450 |
# File 'lib/polars/string_name_space.rb', line 448 def encode(encoding) super end |
#ends_with(sub) ⇒ Series
Check if string values end with a substring.
377 378 379 |
# File 'lib/polars/string_name_space.rb', line 377 def ends_with(sub) super end |
#escape_regex ⇒ Series
Returns string values with all regular expression meta characters escaped.
1541 1542 1543 |
# File 'lib/polars/string_name_space.rb', line 1541 def escape_regex super end |
#extract(pattern, group_index: 1) ⇒ Series
Extract the target capture group from provided patterns.
545 546 547 |
# File 'lib/polars/string_name_space.rb', line 545 def extract(pattern, group_index: 1) super end |
#extract_all(pattern) ⇒ Series
Extracts all matches for the given regex pattern.
Extract each successive non-overlapping regex match in an individual string as an array
569 570 571 |
# File 'lib/polars/string_name_space.rb', line 569 def extract_all(pattern) super end |
#extract_groups(pattern) ⇒ Series
All group names are strings.
Extract all capture groups for the given regex pattern.
602 603 604 |
# File 'lib/polars/string_name_space.rb', line 602 def extract_groups(pattern) super end |
#extract_many(patterns, ascii_case_insensitive: false, overlapping: false) ⇒ Series
This method supports matching on string literals only, and does not support regular expression matching.
Use the Aho-Corasick algorithm to extract many matches.
1415 1416 1417 1418 1419 1420 1421 |
# File 'lib/polars/string_name_space.rb', line 1415 def extract_many( patterns, ascii_case_insensitive: false, overlapping: false ) super end |
#find(pattern, literal: false, strict: true) ⇒ Series
To modify regular expression behaviour (such as case-sensitivity) with
flags, use the inline (?iLmsuxU) syntax.
Return the bytes offset of the first substring matching a pattern.
If the pattern is not found, returns nil.
355 356 357 |
# File 'lib/polars/string_name_space.rb', line 355 def find(pattern, literal: false, strict: true) super end |
#find_many(patterns, ascii_case_insensitive: false, overlapping: false) ⇒ Series
This method supports matching on string literals only, and does not support regular expression matching.
Use the Aho-Corasick algorithm to find all matches.
The function returns the byte offset of the start of each match.
The return type will be List<UInt32>
1485 1486 1487 1488 1489 1490 1491 |
# File 'lib/polars/string_name_space.rb', line 1485 def find_many( patterns, ascii_case_insensitive: false, overlapping: false ) super end |
#head(n) ⇒ Series
Return the first n characters of each string in a String Series.
1171 1172 1173 |
# File 'lib/polars/string_name_space.rb', line 1171 def head(n) super end |
#join(delimiter = "-", ignore_nulls: true) ⇒ Series Also known as: concat
Vertically concat the values in the Series to a single string value.
1521 1522 1523 |
# File 'lib/polars/string_name_space.rb', line 1521 def join(delimiter = "-", ignore_nulls: true) super end |
#json_decode(dtype = nil, infer_schema_length: 100) ⇒ Series
Parse string values as JSON.
Throws an error if invalid JSON strings are encountered.
476 477 478 479 480 481 482 483 484 485 486 487 |
# File 'lib/polars/string_name_space.rb', line 476 def json_decode(dtype = nil, infer_schema_length: 100) if !dtype.nil? s = Utils.wrap_s(_s) return ( s.to_frame .select_seq(F.col(s.name).str.json_decode(dtype)) .to_series ) end Utils.wrap_s(_s.str_json_decode(infer_schema_length)) end |
#json_path_match(json_path) ⇒ Series
Extract the first match of json string with provided JSONPath expression.
Throw errors if encounter invalid json strings. All return value will be casted to Utf8 regardless of the original value.
Documentation on JSONPath standard can be found here.
517 518 519 |
# File 'lib/polars/string_name_space.rb', line 517 def json_path_match(json_path) super end |
#len_bytes ⇒ Series Also known as: lengths
Return the length of each string as the number of bytes.
237 238 239 |
# File 'lib/polars/string_name_space.rb', line 237 def len_bytes super end |
#len_chars ⇒ Series Also known as: n_chars
Return the length of each string as the number of characters.
258 259 260 |
# File 'lib/polars/string_name_space.rb', line 258 def len_chars super end |
#ljust(width, fillchar = " ") ⇒ Series
Return the string left justified in a string of length width.
Padding is done using the specified fillchar. The original string is
returned if width is less than or equal to s.length.
1012 1013 1014 |
# File 'lib/polars/string_name_space.rb', line 1012 def ljust(width, fillchar = " ") super end |
#normalize(form = "NFC") ⇒ Series
Returns the Unicode normal form of the string values.
This uses the forms described in Unicode Standard Annex 15: https://www.unicode.org/reports/tr15/.
1574 1575 1576 |
# File 'lib/polars/string_name_space.rb', line 1574 def normalize(form = "NFC") super end |
#pad_end(length, fill_char = " ") ⇒ Series
Pad the end of the string until it reaches the given length.
954 955 956 |
# File 'lib/polars/string_name_space.rb', line 954 def pad_end(length, fill_char = " ") super end |
#pad_start(length, fill_char = " ") ⇒ Series
Pad the start of the string until it reaches the given length.
928 929 930 |
# File 'lib/polars/string_name_space.rb', line 928 def pad_start(length, fill_char = " ") super end |
#replace(pattern, value, literal: false) ⇒ Series
Replace first matching regex/literal substring with a new string value.
762 763 764 |
# File 'lib/polars/string_name_space.rb', line 762 def replace(pattern, value, literal: false) super end |
#replace_all(pattern, value, literal: false) ⇒ Series
Replace all matching regex/literal substrings with a new string value.
787 788 789 |
# File 'lib/polars/string_name_space.rb', line 787 def replace_all(pattern, value, literal: false) super end |
#replace_many(patterns, replace_with = Expr::NO_DEFAULT, ascii_case_insensitive: false) ⇒ Series
This method supports matching on string literals only, and does not support regular expression matching.
Use the Aho-Corasick algorithm to replace many matches.
1380 1381 1382 1383 1384 1385 1386 |
# File 'lib/polars/string_name_space.rb', line 1380 def replace_many( patterns, replace_with = Expr::NO_DEFAULT, ascii_case_insensitive: false ) super end |
#reverse ⇒ Series
Returns string values in reversed order.
1095 1096 1097 |
# File 'lib/polars/string_name_space.rb', line 1095 def reverse super end |
#rjust(width, fillchar = " ") ⇒ Series
Return the string right justified in a string of length width.
Padding is done using the specified fillchar. The original string is
returned if width is less than or equal to s.length.
1040 1041 1042 |
# File 'lib/polars/string_name_space.rb', line 1040 def rjust(width, fillchar = " ") super end |
#slice(offset, length = nil) ⇒ Series
Create subslices of the string values of a Utf8 Series.
1133 1134 1135 1136 |
# File 'lib/polars/string_name_space.rb', line 1133 def slice(offset, length = nil) s = Utils.wrap_s(_s) s.to_frame.select(Polars.col(s.name).str.slice(offset, length)).to_series end |
#split(by, inclusive: false) ⇒ Series
Split the string by a substring.
636 637 638 |
# File 'lib/polars/string_name_space.rb', line 636 def split(by, inclusive: false) super end |
#split_exact(by, n, inclusive: false) ⇒ Series
Split the string by a substring using n splits.
Results in a struct of n+1 fields.
If it cannot make n splits, the remaining field elements will be null.
687 688 689 |
# File 'lib/polars/string_name_space.rb', line 687 def split_exact(by, n, inclusive: false) super end |
#splitn(by, n) ⇒ Series
Split the string by a substring, restricted to returning at most n items.
If the number of possible splits is less than n-1, the remaining field
elements will be null. If the number of possible splits is n-1 or greater,
the last (nth) substring will contain the remainder of the string.
736 737 738 739 |
# File 'lib/polars/string_name_space.rb', line 736 def splitn(by, n) s = Utils.wrap_s(_s) s.to_frame.select(Polars.col(s.name).str.splitn(by, n)).to_series end |
#starts_with(sub) ⇒ Series
Check if string values start with a substring.
399 400 401 |
# File 'lib/polars/string_name_space.rb', line 399 def starts_with(sub) super end |
#strip_chars(matches = nil) ⇒ Series
Remove leading and trailing whitespace.
808 809 810 |
# File 'lib/polars/string_name_space.rb', line 808 def strip_chars(matches = nil) super end |
#strip_chars_end(matches = nil) ⇒ Series Also known as: rstrip
Remove trailing whitespace.
851 852 853 |
# File 'lib/polars/string_name_space.rb', line 851 def strip_chars_end(matches = nil) super end |
#strip_chars_start(matches = nil) ⇒ Series Also known as: lstrip
Remove leading whitespace.
829 830 831 |
# File 'lib/polars/string_name_space.rb', line 829 def strip_chars_start(matches = nil) super end |
#strip_prefix(prefix) ⇒ Series
Remove prefix.
The prefix will be removed from the string exactly once, if found.
877 878 879 |
# File 'lib/polars/string_name_space.rb', line 877 def strip_prefix(prefix) super end |
#strip_suffix(suffix) ⇒ Series
Remove suffix.
The suffix will be removed from the string exactly once, if found.
902 903 904 |
# File 'lib/polars/string_name_space.rb', line 902 def strip_suffix(suffix) super end |
#strptime(datatype, fmt = nil, strict: true, exact: true, cache: true) ⇒ Series
Parse a Series of dtype Utf8 to a Date/Datetime Series.
183 184 185 |
# File 'lib/polars/string_name_space.rb', line 183 def strptime(datatype, fmt = nil, strict: true, exact: true, cache: true) super end |
#tail(n) ⇒ Series
Return the last n characters of each string in a String Series.
1208 1209 1210 |
# File 'lib/polars/string_name_space.rb', line 1208 def tail(n) super end |
#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ Series
Convert a Utf8 column into a Date column.
41 42 43 |
# File 'lib/polars/string_name_space.rb', line 41 def to_date(format = nil, strict: true, exact: true, cache: true) super end |
#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, ambiguous: "raise") ⇒ Series
Convert a Utf8 column into a Datetime column.
86 87 88 89 90 91 92 93 94 95 96 |
# File 'lib/polars/string_name_space.rb', line 86 def to_datetime( format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, ambiguous: "raise" ) super end |
#to_decimal(inference_length = 100, scale: nil) ⇒ Series
Convert a String column into a Decimal column.
This method infers the needed parameters precision and scale.
213 214 215 216 217 218 219 |
# File 'lib/polars/string_name_space.rb', line 213 def to_decimal(inference_length = 100, scale: nil) if !scale.nil? raise Todo end Utils.wrap_s(_s.str_to_decimal_infer(inference_length)) end |
#to_integer(base: 10, dtype: Int64, strict: true) ⇒ Series
Convert an String column into a column of dtype with base radix.
1252 1253 1254 1255 1256 1257 1258 |
# File 'lib/polars/string_name_space.rb', line 1252 def to_integer( base: 10, dtype: Int64, strict: true ) super end |
#to_lowercase ⇒ Series
Modify the strings to their lowercase equivalent.
1058 1059 1060 |
# File 'lib/polars/string_name_space.rb', line 1058 def to_lowercase super end |
#to_time(format = nil, strict: true, cache: true) ⇒ Series
Convert a Utf8 column into a Time column.
123 124 125 |
# File 'lib/polars/string_name_space.rb', line 123 def to_time(format = nil, strict: true, cache: true) super end |
#to_uppercase ⇒ Series
Modify the strings to their uppercase equivalent.
1076 1077 1078 |
# File 'lib/polars/string_name_space.rb', line 1076 def to_uppercase super end |
#zfill(length) ⇒ Series
Fills the string with zeroes.
Return a copy of the string left filled with ASCII '0' digits to make a string of length width.
A leading sign prefix ('+'/'-') is handled by inserting the padding after the
sign character rather than before. The original string is returned if width is
less than or equal to s.length.
984 985 986 |
# File 'lib/polars/string_name_space.rb', line 984 def zfill(length) super end |