Class: Polars::StringNameSpace

Inherits:
Object
  • Object
show all
Defined in:
lib/polars/string_name_space.rb

Overview

Series.str namespace.

Instance Method Summary collapse

Dynamic Method Handling

This class handles dynamic methods through the method_missing method in the class Polars::ExprDispatch

Instance Method Details

#contains(pattern, literal: false) ⇒ Series

Check if strings in Series contain a substring that matches a regex.

Examples:

s = Polars::Series.new(["Crab", "cat and dog", "rab$bit", nil])
s.str.contains("cat|bit")
# =>
# shape: (4,)
# Series: '' [bool]
# [
#         false
#         true
#         true
#         null
# ]
s.str.contains("rab$", literal: true)
# =>
# shape: (4,)
# Series: '' [bool]
# [
#         false
#         false
#         true
#         null
# ]

Parameters:

  • pattern (String)

    A valid regex pattern.

  • literal (Boolean) (defaults to: false)

    Treat pattern as a literal string.

Returns:



292
293
294
# File 'lib/polars/string_name_space.rb', line 292

def contains(pattern, literal: false)
  super
end

#contains_any(patterns, ascii_case_insensitive: false) ⇒ Series

Note:

This method supports matching on string literals only, and does not support regular expression matching.

Use the Aho-Corasick algorithm to find matches.

Determines if any of the patterns are contained in the string.

Examples:

s = Polars::Series.new(
  "lyrics",
  [
    "Everybody wants to rule the world",
    "Tell me what you want, what you really really want",
    "Can you feel the love tonight"
  ]
)
s.str.contains_any(["you", "me"])
# =>
# shape: (3,)
# Series: 'lyrics' [bool]
# [
#         false
#         true
#         true
# ]

Parameters:

  • patterns (Object)

    String patterns to search.

  • ascii_case_insensitive (Boolean) (defaults to: false)

    Enable ASCII-aware case-insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only.

Returns:



1282
1283
1284
1285
1286
1287
# File 'lib/polars/string_name_space.rb', line 1282

def contains_any(
  patterns,
  ascii_case_insensitive: false
)
  super
end

#count_matches(pattern) ⇒ Series Also known as: count_match

Count all successive non-overlapping regex matches.

Examples:

s = Polars::Series.new("foo", ["123 bla 45 asd", "xyz 678 910t"])
s.str.count_matches('\d')
# =>
# shape: (2,)
# Series: 'foo' [u32]
# [
#         5
#         6
# ]

Parameters:

  • pattern (String)

    A valid regex pattern

Returns:



610
611
612
# File 'lib/polars/string_name_space.rb', line 610

def count_matches(pattern)
  super
end

#decode(encoding, strict: false) ⇒ Series

Decode a value using the provided encoding.

Examples:

s = Polars::Series.new(["666f6f", "626172", nil])
s.str.decode("hex")
# =>
# shape: (3,)
# Series: '' [binary]
# [
#         b"foo"
#         b"bar"
#         null
# ]

Parameters:

  • encoding ("hex", "base64")

    The encoding to use.

  • strict (Boolean) (defaults to: false)

    How to handle invalid inputs:

    • true: An error will be thrown if unable to decode a value.
    • false: Unhandled values will be replaced with nil.

Returns:



422
423
424
# File 'lib/polars/string_name_space.rb', line 422

def decode(encoding, strict: false)
  super
end

#encode(encoding) ⇒ Series

Encode a value using the provided encoding.

Examples:

s = Polars::Series.new(["foo", "bar", nil])
s.str.encode("hex")
# =>
# shape: (3,)
# Series: '' [str]
# [
#         "666f6f"
#         "626172"
#         null
# ]

Parameters:

  • encoding ("hex", "base64")

    The encoding to use.

Returns:



444
445
446
# File 'lib/polars/string_name_space.rb', line 444

def encode(encoding)
  super
end

#ends_with(sub) ⇒ Series

Check if string values end with a substring.

Examples:

s = Polars::Series.new("fruits", ["apple", "mango", nil])
s.str.ends_with("go")
# =>
# shape: (3,)
# Series: 'fruits' [bool]
# [
#         false
#         true
#         null
# ]

Parameters:

  • sub (String)

    Suffix substring.

Returns:



373
374
375
# File 'lib/polars/string_name_space.rb', line 373

def ends_with(sub)
  super
end

#escape_regexSeries

Returns string values with all regular expression meta characters escaped.

Examples:

Polars::Series.new(["abc", "def", nil, "abc(\\w+)"]).str.escape_regex
# =>
# shape: (4,)
# Series: '' [str]
# [
#         "abc"
#         "def"
#         null
#         "abc\(\\w\+\)"
# ]

Returns:



1528
1529
1530
# File 'lib/polars/string_name_space.rb', line 1528

def escape_regex
  super
end

#extract(pattern, group_index: 1) ⇒ Series

Extract the target capture group from provided patterns.

Examples:

df = Polars::DataFrame.new({"foo" => ["123 bla 45 asd", "xyz 678 910t"]})
df.select([Polars.col("foo").str.extract('(\d+)')])
# =>
# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ 123 │
# │ 678 │
# └─────┘

Parameters:

  • pattern (String)

    A valid regex pattern

  • group_index (Integer) (defaults to: 1)

    Index of the targeted capture group. Group 0 mean the whole pattern, first group begin at index 1 Default to the first capture group

Returns:



532
533
534
# File 'lib/polars/string_name_space.rb', line 532

def extract(pattern, group_index: 1)
  super
end

#extract_all(pattern) ⇒ Series

Extracts all matches for the given regex pattern.

Extract each successive non-overlapping regex match in an individual string as an array

Examples:

s = Polars::Series.new("foo", ["123 bla 45 asd", "xyz 678 910t"])
s.str.extract_all('(\d+)')
# =>
# shape: (2,)
# Series: 'foo' [list[str]]
# [
#         ["123", "45"]
#         ["678", "910"]
# ]

Parameters:

  • pattern (String)

    A valid regex pattern

Returns:



556
557
558
# File 'lib/polars/string_name_space.rb', line 556

def extract_all(pattern)
  super
end

#extract_groups(pattern) ⇒ Series

Note:

All group names are strings.

Extract all capture groups for the given regex pattern.

Examples:

s = Polars::Series.new(
  "url",
  [
    "http://vote.com/ballon_dor?candidate=messi&ref=python",
    "http://vote.com/ballon_dor?candidate=weghorst&ref=polars",
    "http://vote.com/ballon_dor?error=404&ref=rust"
  ]
)
s.str.extract_groups("candidate=(?<candidate>\\w+)&ref=(?<ref>\\w+)")
# =>
# shape: (3,)
# Series: 'url' [struct[2]]
# [
#         {"messi","python"}
#         {"weghorst","polars"}
#         {null,null}
# ]

Parameters:

  • pattern (String)

    A valid regular expression pattern containing at least one capture group, compatible with the regex crate.

Returns:



589
590
591
# File 'lib/polars/string_name_space.rb', line 589

def extract_groups(pattern)
  super
end

#extract_many(patterns, ascii_case_insensitive: false, overlapping: false) ⇒ Series

Note:

This method supports matching on string literals only, and does not support regular expression matching.

Use the Aho-Corasick algorithm to extract many matches.

Examples:

s = Polars::Series.new("values", ["discontent"])
patterns = ["winter", "disco", "onte", "discontent"]
s.str.extract_many(patterns, overlapping: true)
# =>
# shape: (1,)
# Series: 'values' [list[str]]
# [
#         ["disco", "onte", "discontent"]
# ]

Parameters:

  • patterns (Object)

    String patterns to search.

  • ascii_case_insensitive (Boolean) (defaults to: false)

    Enable ASCII-aware case-insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only.

  • overlapping (Boolean) (defaults to: false)

    Whether matches may overlap.

Returns:



1402
1403
1404
1405
1406
1407
1408
# File 'lib/polars/string_name_space.rb', line 1402

def extract_many(
  patterns,
  ascii_case_insensitive: false,
  overlapping: false
)
  super
end

#find(pattern, literal: false, strict: true) ⇒ Series

Note:

To modify regular expression behaviour (such as case-sensitivity) with flags, use the inline (?iLmsuxU) syntax.

Return the bytes offset of the first substring matching a pattern.

If the pattern is not found, returns nil.

Examples:

Find the index of the first substring matching a regex pattern:

s = Polars::Series.new("txt", ["Crab", "Lobster", nil, "Crustacean"])
s.str.find("a|e").rename("idx_rx")
# =>
# shape: (4,)
# Series: 'idx_rx' [u32]
# [
#         2
#         5
#         null
#         5
# ]

Find the index of the first substring matching a literal pattern:

s.str.find("e", literal: true).rename("idx_lit")
# =>
# shape: (4,)
# Series: 'idx_lit' [u32]
# [
#         null
#         5
#         null
#         7
# ]

Match against a pattern found in another column or (expression):

p = Polars::Series.new("pat", ["a[bc]", "b.t", "[aeiuo]", "(?i)A[BC]"])
s.str.find(p).rename("idx")
# =>
# shape: (4,)
# Series: 'idx' [u32]
# [
#         2
#         2
#         null
#         5
# ]

Parameters:

  • pattern

    A valid regular expression pattern, compatible with the regex crate.

  • literal (defaults to: false)

    Treat pattern as a literal string, not as a regular expression.

  • strict (defaults to: true)

    Raise an error if the underlying pattern is not a valid regex, otherwise mask out with a null value.

Returns:



351
352
353
# File 'lib/polars/string_name_space.rb', line 351

def find(pattern, literal: false, strict: true)
  super
end

#find_many(patterns, ascii_case_insensitive: false, overlapping: false) ⇒ Series

Note:

This method supports matching on string literals only, and does not support regular expression matching.

Use the Aho-Corasick algorithm to find all matches.

The function returns the byte offset of the start of each match. The return type will be List<UInt32>

Examples:

df = Polars::DataFrame.new({"values" => ["discontent"]})
patterns = ["winter", "disco", "onte", "discontent"]
df.with_columns(
  Polars.col("values")
  .str.extract_many(patterns, overlapping: false)
  .alias("matches"),
  Polars.col("values")
  .str.extract_many(patterns, overlapping: true)
  .alias("matches_overlapping")
)
# =>
# shape: (1, 3)
# ┌────────────┬───────────┬─────────────────────────────────┐
# │ values     ┆ matches   ┆ matches_overlapping             │
# │ ---        ┆ ---       ┆ ---                             │
# │ str        ┆ list[str] ┆ list[str]                       │
# ╞════════════╪═══════════╪═════════════════════════════════╡
# │ discontent ┆ ["disco"] ┆ ["disco", "onte", "discontent"… │
# └────────────┴───────────┴─────────────────────────────────┘
df = Polars::DataFrame.new(
  {
    "values" => ["discontent", "rhapsody"],
    "patterns" => [
      ["winter", "disco", "onte", "discontent"],
      ["rhap", "ody", "coalesce"]
    ]
  }
)
df.select(Polars.col("values").str.find_many("patterns"))
# =>
# shape: (2, 1)
# ┌───────────┐
# │ values    │
# │ ---       │
# │ list[u32] │
# ╞═══════════╡
# │ [0]       │
# │ [0, 5]    │
# └───────────┘

Parameters:

  • patterns (Object)

    String patterns to search.

  • ascii_case_insensitive (Boolean) (defaults to: false)

    Enable ASCII-aware case-insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only.

  • overlapping (Boolean) (defaults to: false)

    Whether matches may overlap.

Returns:



1472
1473
1474
1475
1476
1477
1478
# File 'lib/polars/string_name_space.rb', line 1472

def find_many(
  patterns,
  ascii_case_insensitive: false,
  overlapping: false
)
  super
end

#head(n) ⇒ Series

Return the first n characters of each string in a String Series.

Examples:

Return up to the first 5 characters.

s = Polars::Series.new(["pear", nil, "papaya", "dragonfruit"])
s.str.head(5)
# =>
# shape: (4,)
# Series: '' [str]
# [
#         "pear"
#         null
#         "papay"
#         "drago"
# ]

Return up to the 3rd character from the end.

s = Polars::Series.new(["pear", nil, "papaya", "dragonfruit"])
s.str.head(-3)
# =>
# shape: (4,)
# Series: '' [str]
# [
#         "p"
#         null
#         "pap"
#         "dragonfr"
# ]

Parameters:

  • n (Object)

    Length of the slice (integer or expression). Negative indexing is supported; see note (2) below.

Returns:



1158
1159
1160
# File 'lib/polars/string_name_space.rb', line 1158

def head(n)
  super
end

#join(delimiter = "-", ignore_nulls: true) ⇒ Series Also known as: concat

Vertically concat the values in the Series to a single string value.

Examples:

Polars::Series.new([1, nil, 2]).str.join("-")
# =>
# shape: (1,)
# Series: '' [str]
# [
#         "1-2"
# ]
Polars::Series.new([1, nil, 2]).str.join("-", ignore_nulls: false)
# =>
# shape: (1,)
# Series: '' [str]
# [
#         null
# ]

Parameters:

  • delimiter (String) (defaults to: "-")

    The delimiter to insert between consecutive string values.

  • ignore_nulls (Boolean) (defaults to: true)

    Ignore null values (default). If set to False, null values will be propagated. This means that if the column contains any null values, the output is null.

Returns:



1508
1509
1510
# File 'lib/polars/string_name_space.rb', line 1508

def join(delimiter = "-", ignore_nulls: true)
  super
end

#json_decode(dtype = nil, infer_schema_length: 100) ⇒ Series

Parse string values as JSON.

Throws an error if invalid JSON strings are encountered.

Examples:

s = Polars::Series.new("json", ['{"a":1, "b": true}', nil, '{"a":2, "b": false}'])
s.str.json_decode
# =>
# shape: (3,)
# Series: 'json' [struct[2]]
# [
#         {1,true}
#         null
#         {2,false}
# ]

Parameters:

  • dtype (Object) (defaults to: nil)

    The dtype to cast the extracted value to. If None, the dtype will be inferred from the JSON value.

  • infer_schema_length (Integer) (defaults to: 100)

    The maximum number of rows to scan for schema inference. If set to nil, the full data may be scanned (this is slow).

Returns:



472
473
474
# File 'lib/polars/string_name_space.rb', line 472

def json_decode(dtype = nil, infer_schema_length: 100)
  super
end

#json_path_match(json_path) ⇒ Series

Extract the first match of json string with provided JSONPath expression.

Throw errors if encounter invalid json strings. All return value will be casted to Utf8 regardless of the original value.

Documentation on JSONPath standard can be found here.

Examples:

df = Polars::DataFrame.new(
  {"json_val" => ['{"a":"1"}', nil, '{"a":2}', '{"a":2.1}', '{"a":true}']}
)
df.select(Polars.col("json_val").str.json_path_match("$.a"))[0.., 0]
# =>
# shape: (5,)
# Series: 'json_val' [str]
# [
#         "1"
#         null
#         "2"
#         "2.1"
#         "true"
# ]

Parameters:

  • json_path (String)

    A valid JSON path query string.

Returns:



504
505
506
# File 'lib/polars/string_name_space.rb', line 504

def json_path_match(json_path)
  super
end

#len_bytesSeries Also known as: lengths

Return the length of each string as the number of bytes.

Examples:

s = Polars::Series.new(["Café", "345", "東京", nil])
s.str.len_bytes
# =>
# shape: (4,)
# Series: '' [u32]
# [
#         5
#         3
#         6
#         null
# ]

Returns:



233
234
235
# File 'lib/polars/string_name_space.rb', line 233

def len_bytes
  super
end

#len_charsSeries Also known as: n_chars

Return the length of each string as the number of characters.

Examples:

s = Polars::Series.new(["Café", "345", "東京", nil])
s.str.len_chars
# =>
# shape: (4,)
# Series: '' [u32]
# [
#         4
#         3
#         2
#         null
# ]

Returns:



254
255
256
# File 'lib/polars/string_name_space.rb', line 254

def len_chars
  super
end

#ljust(width, fillchar = " ") ⇒ Series

Return the string left justified in a string of length width.

Padding is done using the specified fillchar. The original string is returned if width is less than or equal to s.length.

Examples:

s = Polars::Series.new("a", ["cow", "monkey", nil, "hippopotamus"])
s.str.ljust(8, "*")
# =>
# shape: (4,)
# Series: 'a' [str]
# [
#         "cow*****"
#         "monkey**"
#         null
#         "hippopotamus"
# ]

Parameters:

  • width (Integer)

    Justify left to this length.

  • fillchar (String) (defaults to: " ")

    Fill with this ASCII character.

Returns:



999
1000
1001
# File 'lib/polars/string_name_space.rb', line 999

def ljust(width, fillchar = " ")
  super
end

#normalize(form = "NFC") ⇒ Series

Returns the Unicode normal form of the string values.

This uses the forms described in Unicode Standard Annex 15: https://www.unicode.org/reports/tr15/.

Examples:

s = Polars::Series.new(["01²", "KADOKAWA"])
s.str.normalize("NFC")
# =>
# shape: (2,)
# Series: '' [str]
# [
#         "01²"
#         "KADOKAWA"
# ]
s.str.normalize("NFKC")
# =>
# shape: (2,)
# Series: '' [str]
# [
#         "012"
#         "KADOKAWA"
# ]

Parameters:

  • form ('NFC', 'NFKC', 'NFD', 'NFKD') (defaults to: "NFC")

    Unicode form to use.

Returns:



1561
1562
1563
# File 'lib/polars/string_name_space.rb', line 1561

def normalize(form = "NFC")
  super
end

#pad_end(length, fill_char = " ") ⇒ Series

Pad the end of the string until it reaches the given length.

Examples:

s = Polars::Series.new(["cow", "monkey", "hippopotamus", nil])
s.str.pad_end(8, "*")
# =>
# shape: (4,)
# Series: '' [str]
# [
#         "cow*****"
#         "monkey**"
#         "hippopotamus"
#         null
# ]

Parameters:

  • length (Integer)

    Pad the string until it reaches this length. Strings with length equal to or greater than this value are returned as-is.

  • fill_char (String) (defaults to: " ")

    The character to pad the string with.

Returns:



941
942
943
# File 'lib/polars/string_name_space.rb', line 941

def pad_end(length, fill_char = " ")
  super
end

#pad_start(length, fill_char = " ") ⇒ Series

Pad the start of the string until it reaches the given length.

Examples:

s = Polars::Series.new("a", ["cow", "monkey", "hippopotamus", nil])
s.str.pad_start(8, "*")
# =>
# shape: (4,)
# Series: 'a' [str]
# [
#         "*****cow"
#         "**monkey"
#         "hippopotamus"
#         null
# ]

Parameters:

  • length (Integer)

    Pad the string until it reaches this length. Strings with length equal to or greater than this value are returned as-is.

  • fill_char (String) (defaults to: " ")

    The character to pad the string with.

Returns:



915
916
917
# File 'lib/polars/string_name_space.rb', line 915

def pad_start(length, fill_char = " ")
  super
end

#replace(pattern, value, literal: false) ⇒ Series

Replace first matching regex/literal substring with a new string value.

Examples:

s = Polars::Series.new(["123abc", "abc456"])
s.str.replace('abc\b', "ABC")
# =>
# shape: (2,)
# Series: '' [str]
# [
#         "123ABC"
#         "abc456"
# ]

Parameters:

  • pattern (String)

    A valid regex pattern.

  • value (String)

    Substring to replace.

  • literal (Boolean) (defaults to: false)

    Treat pattern as a literal string.

Returns:



749
750
751
# File 'lib/polars/string_name_space.rb', line 749

def replace(pattern, value, literal: false)
  super
end

#replace_all(pattern, value, literal: false) ⇒ Series

Replace all matching regex/literal substrings with a new string value.

Examples:

df = Polars::Series.new(["abcabc", "123a123"])
df.str.replace_all("a", "-")
# =>
# shape: (2,)
# Series: '' [str]
# [
#         "-bc-bc"
#         "123-123"
# ]

Parameters:

  • pattern (String)

    A valid regex pattern.

  • value (String)

    Substring to replace.

  • literal (Boolean) (defaults to: false)

    Treat pattern as a literal string.

Returns:



774
775
776
# File 'lib/polars/string_name_space.rb', line 774

def replace_all(pattern, value, literal: false)
  super
end

#replace_many(patterns, replace_with = Expr::NO_DEFAULT, ascii_case_insensitive: false) ⇒ Series

Note:

This method supports matching on string literals only, and does not support regular expression matching.

Use the Aho-Corasick algorithm to replace many matches.

Examples:

Replace many patterns by passing lists of equal length to the patterns and replace_with parameters.

s = Polars::Series.new(
  "lyrics",
  [
    "Everybody wants to rule the world",
    "Tell me what you want, what you really really want",
    "Can you feel the love tonight"
  ]
)
s.str.replace_many(["you", "me"], ["me", "you"])
# =>
# shape: (3,)
# Series: 'lyrics' [str]
# [
#         "Everybody wants to rule the wo…
#         "Tell you what me want, what me…
#         "Can me feel the love tonight"
# ]

Broadcast a replacement for many patterns by passing a sequence of length 1 to the replace_with parameter.

s = Polars::Series.new(
  "lyrics",
  [
    "Everybody wants to rule the world",
    "Tell me what you want, what you really really want",
    "Can you feel the love tonight",
  ]
)
s.str.replace_many(["me", "you", "they"], [""])
# =>
# shape: (3,)
# Series: 'lyrics' [str]
# [
#         "Everybody wants to rule the wo…
#         "Tell  what  want, what  really…
#         "Can  feel the love tonight"
# ]

Passing a mapping with patterns and replacements is also supported as syntactic sugar.

s = Polars::Series.new(
  "lyrics",
  [
    "Everybody wants to rule the world",
    "Tell me what you want, what you really really want",
    "Can you feel the love tonight"
  ]
)
mapping = {"me" => "you", "you" => "me", "want" => "need"}
s.str.replace_many(mapping)
# =>
# shape: (3,)
# Series: 'lyrics' [str]
# [
#         "Everybody needs to rule the wo…
#         "Tell you what me need, what me…
#         "Can me feel the love tonight"
# ]

Parameters:

  • patterns

    String patterns to search and replace. Also accepts a mapping of patterns to their replacement as syntactic sugar for replace_many(Polars::Series.new(mapping.keys), Polars::Series.new(mapping.values)).

  • replace_with (defaults to: Expr::NO_DEFAULT)

    Strings to replace where a pattern was a match. Length must match the length of patterns or have length 1. This can be broadcasted, so it supports many:one and many:many.

  • ascii_case_insensitive (defaults to: false)

    Enable ASCII-aware case-insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only.

Returns:



1367
1368
1369
1370
1371
1372
1373
# File 'lib/polars/string_name_space.rb', line 1367

def replace_many(
  patterns,
  replace_with = Expr::NO_DEFAULT,
  ascii_case_insensitive: false
)
  super
end

#reverseSeries

Returns string values in reversed order.

Examples:

s = Polars::Series.new("text", ["foo", "bar", "man\u0303ana"])
s.str.reverse
# =>
# shape: (3,)
# Series: 'text' [str]
# [
#         "oof"
#         "rab"
#         "anañam"
# ]

Returns:



1082
1083
1084
# File 'lib/polars/string_name_space.rb', line 1082

def reverse
  super
end

#rjust(width, fillchar = " ") ⇒ Series

Return the string right justified in a string of length width.

Padding is done using the specified fillchar. The original string is returned if width is less than or equal to s.length.

Examples:

s = Polars::Series.new("a", ["cow", "monkey", nil, "hippopotamus"])
s.str.rjust(8, "*")
# =>
# shape: (4,)
# Series: 'a' [str]
# [
#         "*****cow"
#         "**monkey"
#         null
#         "hippopotamus"
# ]

Parameters:

  • width (Integer)

    Justify right to this length.

  • fillchar (String) (defaults to: " ")

    Fill with this ASCII character.

Returns:



1027
1028
1029
# File 'lib/polars/string_name_space.rb', line 1027

def rjust(width, fillchar = " ")
  super
end

#slice(offset, length = nil) ⇒ Series

Create subslices of the string values of a Utf8 Series.

Examples:

s = Polars::Series.new("s", ["pear", nil, "papaya", "dragonfruit"])
s.str.slice(-3)
# =>
# shape: (4,)
# Series: 's' [str]
# [
#         "ear"
#         null
#         "aya"
#         "uit"
# ]

Using the optional length parameter

s.str.slice(4, 3)
# =>
# shape: (4,)
# Series: 's' [str]
# [
#         ""
#         null
#         "ya"
#         "onf"
# ]

Parameters:

  • offset (Integer)

    Start index. Negative indexing is supported.

  • length (Integer) (defaults to: nil)

    Length of the slice. If set to nil (default), the slice is taken to the end of the string.

Returns:



1120
1121
1122
1123
# File 'lib/polars/string_name_space.rb', line 1120

def slice(offset, length = nil)
  s = Utils.wrap_s(_s)
  s.to_frame.select(Polars.col(s.name).str.slice(offset, length)).to_series
end

#split(by, inclusive: false) ⇒ Series

Split the string by a substring.

Parameters:

  • by (String)

    Substring to split by.

  • inclusive (Boolean) (defaults to: false)

    If true, include the split character/string in the results.

Returns:



623
624
625
# File 'lib/polars/string_name_space.rb', line 623

def split(by, inclusive: false)
  super
end

#split_exact(by, n, inclusive: false) ⇒ Series

Split the string by a substring using n splits.

Results in a struct of n+1 fields.

If it cannot make n splits, the remaining field elements will be null.

Examples:

df = Polars::DataFrame.new({"x" => ["a_1", nil, "c", "d_4"]})
df["x"].str.split_exact("_", 1).alias("fields")
# =>
# shape: (4,)
# Series: 'fields' [struct[2]]
# [
#         {"a","1"}
#         {null,null}
#         {"c",null}
#         {"d","4"}
# ]

Split string values in column x in exactly 2 parts and assign each part to a new column.

df["x"]
  .str.split_exact("_", 1)
  .struct.rename_fields(["first_part", "second_part"])
  .alias("fields")
  .to_frame
  .unnest("fields")
# =>
# shape: (4, 2)
# ┌────────────┬─────────────┐
# │ first_part ┆ second_part │
# │ ---        ┆ ---         │
# │ str        ┆ str         │
# ╞════════════╪═════════════╡
# │ a          ┆ 1           │
# │ null       ┆ null        │
# │ c          ┆ null        │
# │ d          ┆ 4           │
# └────────────┴─────────────┘

Parameters:

  • by (String)

    Substring to split by.

  • n (Integer)

    Number of splits to make.

  • inclusive (Boolean) (defaults to: false)

    If true, include the split character/string in the results.

Returns:



674
675
676
# File 'lib/polars/string_name_space.rb', line 674

def split_exact(by, n, inclusive: false)
  super
end

#splitn(by, n) ⇒ Series

Split the string by a substring, restricted to returning at most n items.

If the number of possible splits is less than n-1, the remaining field elements will be null. If the number of possible splits is n-1 or greater, the last (nth) substring will contain the remainder of the string.

Examples:

df = Polars::DataFrame.new({"s" => ["foo bar", nil, "foo-bar", "foo bar baz"]})
df["s"].str.splitn(" ", 2).alias("fields")
# =>
# shape: (4,)
# Series: 'fields' [struct[2]]
# [
#         {"foo","bar"}
#         {null,null}
#         {"foo-bar",null}
#         {"foo","bar baz"}
# ]

Split string values in column s in exactly 2 parts and assign each part to a new column.

df["s"]
  .str.splitn(" ", 2)
  .struct.rename_fields(["first_part", "second_part"])
  .alias("fields")
  .to_frame
  .unnest("fields")
# =>
# shape: (4, 2)
# ┌────────────┬─────────────┐
# │ first_part ┆ second_part │
# │ ---        ┆ ---         │
# │ str        ┆ str         │
# ╞════════════╪═════════════╡
# │ foo        ┆ bar         │
# │ null       ┆ null        │
# │ foo-bar    ┆ null        │
# │ foo        ┆ bar baz     │
# └────────────┴─────────────┘

Parameters:

  • by (String)

    Substring to split by.

  • n (Integer)

    Max number of items to return.

Returns:



723
724
725
726
# File 'lib/polars/string_name_space.rb', line 723

def splitn(by, n)
  s = Utils.wrap_s(_s)
  s.to_frame.select(Polars.col(s.name).str.splitn(by, n)).to_series
end

#starts_with(sub) ⇒ Series

Check if string values start with a substring.

Examples:

s = Polars::Series.new("fruits", ["apple", "mango", nil])
s.str.starts_with("app")
# =>
# shape: (3,)
# Series: 'fruits' [bool]
# [
#         true
#         false
#         null
# ]

Parameters:

  • sub (String)

    Prefix substring.

Returns:



395
396
397
# File 'lib/polars/string_name_space.rb', line 395

def starts_with(sub)
  super
end

#strip_chars(matches = nil) ⇒ Series

Remove leading and trailing whitespace.

Examples:

s = Polars::Series.new([" hello ", "\tworld"])
s.str.strip_chars
# =>
# shape: (2,)
# Series: '' [str]
# [
#         "hello"
#         "world"
# ]

Parameters:

  • matches (String, nil) (defaults to: nil)

    An optional single character that should be trimmed

Returns:



795
796
797
# File 'lib/polars/string_name_space.rb', line 795

def strip_chars(matches = nil)
  super
end

#strip_chars_end(matches = nil) ⇒ Series Also known as: rstrip

Remove trailing whitespace.

Examples:

s = Polars::Series.new([" hello ", "world\t"])
s.str.strip_chars_end
# =>
# shape: (2,)
# Series: '' [str]
# [
#         " hello"
#         "world"
# ]

Parameters:

  • matches (String, nil) (defaults to: nil)

    An optional single character that should be trimmed

Returns:



838
839
840
# File 'lib/polars/string_name_space.rb', line 838

def strip_chars_end(matches = nil)
  super
end

#strip_chars_start(matches = nil) ⇒ Series Also known as: lstrip

Remove leading whitespace.

Examples:

s = Polars::Series.new([" hello ", "\tworld"])
s.str.strip_chars_start
# =>
# shape: (2,)
# Series: '' [str]
# [
#         "hello "
#         "world"
# ]

Parameters:

  • matches (String, nil) (defaults to: nil)

    An optional single character that should be trimmed

Returns:



816
817
818
# File 'lib/polars/string_name_space.rb', line 816

def strip_chars_start(matches = nil)
  super
end

#strip_prefix(prefix) ⇒ Series

Remove prefix.

The prefix will be removed from the string exactly once, if found.

Examples:

s = Polars::Series.new(["foobar", "foofoobar", "foo", "bar"])
s.str.strip_prefix("foo")
# =>
# shape: (4,)
# Series: '' [str]
# [
#         "bar"
#         "foobar"
#         ""
#         "bar"
# ]

Parameters:

  • prefix (String)

    The prefix to be removed.

Returns:



864
865
866
# File 'lib/polars/string_name_space.rb', line 864

def strip_prefix(prefix)
  super
end

#strip_suffix(suffix) ⇒ Series

Remove suffix.

The suffix will be removed from the string exactly once, if found.

Examples:

s = Polars::Series.new(["foobar", "foobarbar", "foo", "bar"])
s.str.strip_suffix("bar")
# =>
# shape: (4,)
# Series: '' [str]
# [
#         "foo"
#         "foobar"
#         "foo"
#         ""
# ]

Parameters:

  • suffix (String)

    The suffix to be removed.

Returns:



889
890
891
# File 'lib/polars/string_name_space.rb', line 889

def strip_suffix(suffix)
  super
end

#strptime(datatype, fmt = nil, strict: true, exact: true, cache: true) ⇒ Series

Parse a Series of dtype Utf8 to a Date/Datetime Series.

Examples:

Dealing with a consistent format:

s = Polars::Series.new(["2020-01-01 01:00Z", "2020-01-01 02:00Z"])
s.str.strptime(Polars::Datetime, "%Y-%m-%d %H:%M%#z")
# =>
# shape: (2,)
# Series: '' [datetime[μs, UTC]]
# [
#         2020-01-01 01:00:00 UTC
#         2020-01-01 02:00:00 UTC
# ]

Dealing with different formats.

s = Polars::Series.new(
  "date",
  [
    "2021-04-22",
    "2022-01-04 00:00:00",
    "01/31/22",
    "Sun Jul  8 00:34:60 2001"
  ]
)
s.to_frame.select(
  Polars.coalesce(
    Polars.col("date").str.strptime(Polars::Date, "%F", strict: false),
    Polars.col("date").str.strptime(Polars::Date, "%F %T", strict: false),
    Polars.col("date").str.strptime(Polars::Date, "%D", strict: false),
    Polars.col("date").str.strptime(Polars::Date, "%c", strict: false)
  )
).to_series
# =>
# shape: (4,)
# Series: 'date' [date]
# [
#         2021-04-22
#         2022-01-04
#         2022-01-31
#         2001-07-08
# ]

Parameters:

  • datatype (Symbol)

    :date, :datetime, or :time.

  • fmt (String) (defaults to: nil)

    Format to use, refer to the chrono strftime documentation for specification. Example: "%y-%m-%d".

  • strict (Boolean) (defaults to: true)

    Raise an error if any conversion fails.

  • exact (Boolean) (defaults to: true)
    • If true, require an exact format match.
    • If false, allow the format to match anywhere in the target string.
  • cache (Boolean) (defaults to: true)

    Use a cache of unique, converted dates to apply the datetime conversion.

Returns:



183
184
185
# File 'lib/polars/string_name_space.rb', line 183

def strptime(datatype, fmt = nil, strict: true, exact: true, cache: true)
  super
end

#tail(n) ⇒ Series

Return the last n characters of each string in a String Series.

Examples:

Return up to the last 5 characters:

s = Polars::Series.new(["pear", nil, "papaya", "dragonfruit"])
s.str.tail(5)
# =>
# shape: (4,)
# Series: '' [str]
# [
#         "pear"
#         null
#         "apaya"
#         "fruit"
# ]

Return from the 3rd character to the end:

s = Polars::Series.new(["pear", nil, "papaya", "dragonfruit"])
s.str.tail(-3)
# =>
# shape: (4,)
# Series: '' [str]
# [
#         "r"
#         null
#         "aya"
#         "gonfruit"
# ]

Parameters:

  • n (Object)

    Length of the slice (integer or expression). Negative indexing is supported; see note (2) below.

Returns:



1195
1196
1197
# File 'lib/polars/string_name_space.rb', line 1195

def tail(n)
  super
end

#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ Series

Convert a Utf8 column into a Date column.

Examples:

s = Polars::Series.new(["2020/01/01", "2020/02/01", "2020/03/01"])
s.str.to_date
# =>
# shape: (3,)
# Series: '' [date]
# [
#         2020-01-01
#         2020-02-01
#         2020-03-01
# ]

Parameters:

  • format (String) (defaults to: nil)

    Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%Y-%m-%d". If set to nil (default), the format is inferred from the data.

  • strict (Boolean) (defaults to: true)

    Raise an error if any conversion fails.

  • exact (Boolean) (defaults to: true)

    Require an exact format match. If false, allow the format to match anywhere in the target string.

  • cache (Boolean) (defaults to: true)

    Use a cache of unique, converted dates to apply the conversion.

Returns:



41
42
43
# File 'lib/polars/string_name_space.rb', line 41

def to_date(format = nil, strict: true, exact: true, cache: true)
  super
end

#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, ambiguous: "raise") ⇒ Series

Convert a Utf8 column into a Datetime column.

Examples:

s = Polars::Series.new(["2020-01-01 01:00Z", "2020-01-01 02:00Z"])
s.str.to_datetime("%Y-%m-%d %H:%M%#z")
# =>
# shape: (2,)
# Series: '' [datetime[μs, UTC]]
# [
#         2020-01-01 01:00:00 UTC
#         2020-01-01 02:00:00 UTC
# ]

Parameters:

  • format (String) (defaults to: nil)

    Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%Y-%m-%d %H:%M:%S". If set to nil (default), the format is inferred from the data.

  • time_unit ("us", "ns", "ms") (defaults to: nil)

    Unit of time for the resulting Datetime column. If set to nil (default), the time unit is inferred from the format string if given, eg: "%F %T%.3f" => Datetime("ms"). If no fractional second component is found, the default is "us".

  • time_zone (String) (defaults to: nil)

    Time zone for the resulting Datetime column.

  • strict (Boolean) (defaults to: true)

    Raise an error if any conversion fails.

  • exact (Boolean) (defaults to: true)

    Require an exact format match. If false, allow the format to match anywhere in the target string.

  • cache (Boolean) (defaults to: true)

    Use a cache of unique, converted datetimes to apply the conversion.

  • ambiguous ('raise', 'earliest', 'latest', 'null') (defaults to: "raise")

    Determine how to deal with ambiguous datetimes:

    • 'raise' (default): raise
    • 'earliest': use the earliest datetime
    • 'latest': use the latest datetime
    • 'null': set to null

Returns:



86
87
88
89
90
91
92
93
94
95
96
# File 'lib/polars/string_name_space.rb', line 86

def to_datetime(
  format = nil,
  time_unit: nil,
  time_zone: nil,
  strict: true,
  exact: true,
  cache: true,
  ambiguous: "raise"
)
  super
end

#to_decimal(inference_length = 100) ⇒ Series

Convert a String column into a Decimal column.

This method infers the needed parameters precision and scale.

Examples:

s = Polars::Series.new(
  ["40.12", "3420.13", "120134.19", "3212.98", "12.90", "143.09", "143.9"]
)
s.str.to_decimal
# =>
# shape: (7,)
# Series: '' [decimal[*,2]]
# [
#         40.12
#         3420.13
#         120134.19
#         3212.98
#         12.90
#         143.09
#         143.90
# ]

Parameters:

  • inference_length (Integer) (defaults to: 100)

    Number of elements to parse to determine the precision and scale

Returns:



213
214
215
# File 'lib/polars/string_name_space.rb', line 213

def to_decimal(inference_length = 100)
  super
end

#to_integer(base: 10, dtype: Int64, strict: true) ⇒ Series

Convert an String column into a column of dtype with base radix.

Examples:

s = Polars::Series.new("bin", ["110", "101", "010", "invalid"])
s.str.to_integer(base: 2, dtype: Polars::Int32, strict: false)
# =>
# shape: (4,)
# Series: 'bin' [i32]
# [
#         6
#         5
#         2
#         null
# ]
s = Polars::Series.new("hex", ["fa1e", "ff00", "cafe", nil])
s.str.to_integer(base: 16)
# =>
# shape: (4,)
# Series: 'hex' [i64]
# [
#         64030
#         65280
#         51966
#         null
# ]

Parameters:

  • base (Integer) (defaults to: 10)

    Positive integer or expression which is the base of the string we are parsing. Default: 10.

  • dtype (Object) (defaults to: Int64)

    Polars integer type to cast to. Default: Int64.

  • strict (Object) (defaults to: true)

    Bool, Default=true will raise any ParseError or overflow as ComputeError. false silently convert to Null.

Returns:



1239
1240
1241
1242
1243
1244
1245
# File 'lib/polars/string_name_space.rb', line 1239

def to_integer(
  base: 10,
  dtype: Int64,
  strict: true
)
  super
end

#to_lowercaseSeries

Modify the strings to their lowercase equivalent.

Examples:

s = Polars::Series.new("foo", ["CAT", "DOG"])
s.str.to_lowercase
# =>
# shape: (2,)
# Series: 'foo' [str]
# [
#         "cat"
#         "dog"
# ]

Returns:



1045
1046
1047
# File 'lib/polars/string_name_space.rb', line 1045

def to_lowercase
  super
end

#to_time(format = nil, strict: true, cache: true) ⇒ Series

Convert a Utf8 column into a Time column.

Examples:

s = Polars::Series.new(["01:00", "02:00", "03:00"])
s.str.to_time("%H:%M")
# =>
# shape: (3,)
# Series: '' [time]
# [
#         01:00:00
#         02:00:00
#         03:00:00
# ]

Parameters:

  • format (String) (defaults to: nil)

    Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%H:%M:%S". If set to nil (default), the format is inferred from the data.

  • strict (Boolean) (defaults to: true)

    Raise an error if any conversion fails.

  • cache (Boolean) (defaults to: true)

    Use a cache of unique, converted times to apply the conversion.

Returns:



123
124
125
# File 'lib/polars/string_name_space.rb', line 123

def to_time(format = nil, strict: true, cache: true)
  super
end

#to_uppercaseSeries

Modify the strings to their uppercase equivalent.

Examples:

s = Polars::Series.new("foo", ["cat", "dog"])
s.str.to_uppercase
# =>
# shape: (2,)
# Series: 'foo' [str]
# [
#         "CAT"
#         "DOG"
# ]

Returns:



1063
1064
1065
# File 'lib/polars/string_name_space.rb', line 1063

def to_uppercase
  super
end

#zfill(length) ⇒ Series

Fills the string with zeroes.

Return a copy of the string left filled with ASCII '0' digits to make a string of length width.

A leading sign prefix ('+'/'-') is handled by inserting the padding after the sign character rather than before. The original string is returned if width is less than or equal to s.length.

Examples:

s = Polars::Series.new([-1, 123, 999999, nil])
s.cast(Polars::String).str.zfill(4)
# =>
# shape: (4,)
# Series: '' [str]
# [
#         "-001"
#         "0123"
#         "999999"
#         null
# ]

Parameters:

  • length (Integer)

    Fill the value up to this length.

Returns:



971
972
973
# File 'lib/polars/string_name_space.rb', line 971

def zfill(length)
  super
end