Class: Polars::StringExpr

Inherits:
Object
  • Object
show all
Defined in:
lib/polars/string_expr.rb

Overview

Namespace for string related expressions.

Instance Method Summary collapse

Instance Method Details

#contains(pattern, literal: false, strict: true) ⇒ Expr

Check if string contains a substring that matches a regex.

Examples:

df = Polars::DataFrame.new({"a" => ["Crab", "cat and dog", "rab$bit", nil]})
df.select(
  [
    Polars.col("a"),
    Polars.col("a").str.contains("cat|bit").alias("regex"),
    Polars.col("a").str.contains("rab$", literal: true).alias("literal")
  ]
)
# =>
# shape: (4, 3)
# ┌─────────────┬───────┬─────────┐
# │ a           ┆ regex ┆ literal │
# │ ---         ┆ ---   ┆ ---     │
# │ str         ┆ bool  ┆ bool    │
# ╞═════════════╪═══════╪═════════╡
# │ Crab        ┆ false ┆ false   │
# │ cat and dog ┆ true  ┆ false   │
# │ rab$bit     ┆ true  ┆ true    │
# │ null        ┆ null  ┆ null    │
# └─────────────┴───────┴─────────┘

Parameters:

  • pattern (String)

    A valid regex pattern.

  • literal (Boolean) (defaults to: false)

    Treat pattern as a literal string.

  • strict (Boolean) (defaults to: true)

    Raise an error if the underlying pattern is not a valid regex, otherwise mask out with a null value.

Returns:



768
769
770
771
# File 'lib/polars/string_expr.rb', line 768

def contains(pattern, literal: false, strict: true)
  pattern = Utils.parse_into_expression(pattern, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_contains(pattern, literal, strict))
end

#contains_any(patterns, ascii_case_insensitive: false) ⇒ Expr

Use the aho-corasick algorithm to find matches.

This version determines if any of the patterns find a match.

Examples:

df = Polars::DataFrame.new(
  {
    "lyrics": [
      "Everybody wants to rule the world",
      "Tell me what you want, what you really really want",
      "Can you feel the love tonight"
    ]
  }
)
df.with_columns(
  Polars.col("lyrics").str.contains_any(["you", "me"]).alias("contains_any")
)
# =>
# shape: (3, 2)
# ┌─────────────────────────────────┬──────────────┐
# │ lyrics                          ┆ contains_any │
# │ ---                             ┆ ---          │
# │ str                             ┆ bool         │
# ╞═════════════════════════════════╪══════════════╡
# │ Everybody wants to rule the wo… ┆ false        │
# │ Tell me what you want, what yo… ┆ true         │
# │ Can you feel the love tonight   ┆ true         │
# └─────────────────────────────────┴──────────────┘

Parameters:

  • patterns (String)

    String patterns to search.

  • ascii_case_insensitive (Boolean) (defaults to: false)

    Enable ASCII-aware case insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only.

Returns:



1676
1677
1678
1679
1680
1681
# File 'lib/polars/string_expr.rb', line 1676

def contains_any(patterns, ascii_case_insensitive: false)
  patterns = Utils.parse_into_expression(patterns, str_as_lit: false)
  Utils.wrap_expr(
    _rbexpr.str_contains_any(patterns, ascii_case_insensitive)
  )
end

#count_matches(pattern, literal: false) ⇒ Expr Also known as: count_match

Count all successive non-overlapping regex matches.

Examples:

df = Polars::DataFrame.new({"foo" => ["123 bla 45 asd", "xyz 678 910t"]})
df.select(
  [
    Polars.col("foo").str.count_match('\d').alias("count_digits")
  ]
)
# =>
# shape: (2, 1)
# ┌──────────────┐
# │ count_digits │
# │ ---          │
# │ u32          │
# ╞══════════════╡
# │ 5            │
# │ 6            │
# └──────────────┘

Parameters:

  • pattern (String)

    A valid regex pattern

  • literal (Boolean) (defaults to: false)

    Treat pattern as a literal string, not as a regular expression.

Returns:



1204
1205
1206
1207
# File 'lib/polars/string_expr.rb', line 1204

def count_matches(pattern, literal: false)
  pattern = Utils.parse_into_expression(pattern, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_count_matches(pattern, literal))
end

#decode(encoding, strict: true) ⇒ Expr

Decode a value using the provided encoding.

Examples:

df = Polars::DataFrame.new({"encoded" => ["666f6f", "626172", nil]})
df.select(Polars.col("encoded").str.decode("hex"))
# =>
# shape: (3, 1)
# ┌─────────┐
# │ encoded │
# │ ---     │
# │ binary  │
# ╞═════════╡
# │ b"foo"  │
# │ b"bar"  │
# │ null    │
# └─────────┘

Parameters:

  • encoding ("hex", "base64")

    The encoding to use.

  • strict (Boolean) (defaults to: true)

    How to handle invalid inputs:

    • true: An error will be thrown if unable to decode a value.
    • false: Unhandled values will be replaced with nil.

Returns:



1016
1017
1018
1019
1020
1021
1022
1023
1024
# File 'lib/polars/string_expr.rb', line 1016

def decode(encoding, strict: true)
  if encoding == "hex"
    Utils.wrap_expr(_rbexpr.str_hex_decode(strict))
  elsif encoding == "base64"
    Utils.wrap_expr(_rbexpr.str_base64_decode(strict))
  else
    raise ArgumentError, "encoding must be one of {{'hex', 'base64'}}, got #{encoding}"
  end
end

#encode(encoding) ⇒ Expr

Encode a value using the provided encoding.

Examples:

df = Polars::DataFrame.new({"strings" => ["foo", "bar", nil]})
df.select(Polars.col("strings").str.encode("hex"))
# =>
# shape: (3, 1)
# ┌─────────┐
# │ strings │
# │ ---     │
# │ str     │
# ╞═════════╡
# │ 666f6f  │
# │ 626172  │
# │ null    │
# └─────────┘

Parameters:

  • encoding ("hex", "base64")

    The encoding to use.

Returns:



1047
1048
1049
1050
1051
1052
1053
1054
1055
# File 'lib/polars/string_expr.rb', line 1047

def encode(encoding)
  if encoding == "hex"
    Utils.wrap_expr(_rbexpr.str_hex_encode)
  elsif encoding == "base64"
    Utils.wrap_expr(_rbexpr.str_base64_encode)
  else
    raise ArgumentError, "encoding must be one of {{'hex', 'base64'}}, got #{encoding}"
  end
end

#ends_with(sub) ⇒ Expr

Check if string values end with a substring.

Examples:

df = Polars::DataFrame.new({"fruits" => ["apple", "mango", nil]})
df.with_column(
  Polars.col("fruits").str.ends_with("go").alias("has_suffix")
)
# =>
# shape: (3, 2)
# ┌────────┬────────────┐
# │ fruits ┆ has_suffix │
# │ ---    ┆ ---        │
# │ str    ┆ bool       │
# ╞════════╪════════════╡
# │ apple  ┆ false      │
# │ mango  ┆ true       │
# │ null   ┆ null       │
# └────────┴────────────┘

Using ends_with as a filter condition:

df.filter(Polars.col("fruits").str.ends_with("go"))
# =>
# shape: (1, 1)
# ┌────────┐
# │ fruits │
# │ ---    │
# │ str    │
# ╞════════╡
# │ mango  │
# └────────┘

Parameters:

  • sub (String)

    Suffix substring.

Returns:



870
871
872
873
# File 'lib/polars/string_expr.rb', line 870

def ends_with(sub)
  sub = Utils.parse_into_expression(sub, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_ends_with(sub))
end

#escape_regexExpr

Returns string values with all regular expression meta characters escaped.

Examples:

df = Polars::DataFrame.new({"text" => ["abc", "def", nil, "abc(\\w+)"]})
df.with_columns(Polars.col("text").str.escape_regex.alias("escaped"))
# =>
# shape: (4, 2)
# ┌──────────┬──────────────┐
# │ text     ┆ escaped      │
# │ ---      ┆ ---          │
# │ str      ┆ str          │
# ╞══════════╪══════════════╡
# │ abc      ┆ abc          │
# │ def      ┆ def          │
# │ null     ┆ null         │
# │ abc(\w+) ┆ abc\(\\w\+\) │
# └──────────┴──────────────┘

Returns:



388
389
390
# File 'lib/polars/string_expr.rb', line 388

def escape_regex
  Utils.wrap_expr(_rbexpr.str_escape_regex)
end

#extract(pattern, group_index: 1) ⇒ Expr

Extract the target capture group from provided patterns.

Examples:

df = Polars::DataFrame.new({"foo" => ["123 bla 45 asd", "xyz 678 910t"]})
df.select(
  [
    Polars.col("foo").str.extract('(\d+)')
  ]
)
# =>
# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ 123 │
# │ 678 │
# └─────┘

Parameters:

  • pattern (String)

    A valid regex pattern

  • group_index (Integer) (defaults to: 1)

    Index of the targeted capture group. Group 0 mean the whole pattern, first group begin at index 1 Default to the first capture group

Returns:



1085
1086
1087
1088
# File 'lib/polars/string_expr.rb', line 1085

def extract(pattern, group_index: 1)
  pattern = Utils.parse_into_expression(pattern, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_extract(pattern, group_index))
end

#extract_all(pattern) ⇒ Expr

Extracts all matches for the given regex pattern.

Extracts each successive non-overlapping regex match in an individual string as an array.

Examples:

df = Polars::DataFrame.new({"foo" => ["123 bla 45 asd", "xyz 678 910t"]})
df.select(
  [
    Polars.col("foo").str.extract_all('(\d+)').alias("extracted_nrs")
  ]
)
# =>
# shape: (2, 1)
# ┌────────────────┐
# │ extracted_nrs  │
# │ ---            │
# │ list[str]      │
# ╞════════════════╡
# │ ["123", "45"]  │
# │ ["678", "910"] │
# └────────────────┘

Parameters:

  • pattern (String)

    A valid regex pattern

Returns:



1117
1118
1119
1120
# File 'lib/polars/string_expr.rb', line 1117

def extract_all(pattern)
  pattern = Utils.parse_into_expression(pattern, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_extract_all(pattern))
end

#extract_groups(pattern) ⇒ Expr

Extract all capture groups for the given regex pattern.

Examples:

df = Polars::DataFrame.new(
  {
    "url": [
      "http://vote.com/ballon_dor?candidate=messi&ref=python",
      "http://vote.com/ballon_dor?candidate=weghorst&ref=polars",
      "http://vote.com/ballon_dor?error=404&ref=rust"
    ]
  }
)
pattern = /candidate=(?<candidate>\w+)&ref=(?<ref>\w+)/.to_s
df.select(captures: Polars.col("url").str.extract_groups(pattern)).unnest(
  "captures"
)
# =>
# shape: (3, 2)
# ┌───────────┬────────┐
# │ candidate ┆ ref    │
# │ ---       ┆ ---    │
# │ str       ┆ str    │
# ╞═══════════╪════════╡
# │ messi     ┆ python │
# │ weghorst  ┆ polars │
# │ null      ┆ null   │
# └───────────┴────────┘

Unnamed groups have their numerical position converted to a string:

pattern = /candidate=(\w+)&ref=(\w+)/.to_s
(
  df.with_columns(
    captures: Polars.col("url").str.extract_groups(pattern)
  ).with_columns(name: Polars.col("captures").struct["1"].str.to_uppercase)
)
# =>
# shape: (3, 3)
# ┌─────────────────────────────────┬───────────────────────┬──────────┐
# │ url                             ┆ captures              ┆ name     │
# │ ---                             ┆ ---                   ┆ ---      │
# │ str                             ┆ struct[2]             ┆ str      │
# ╞═════════════════════════════════╪═══════════════════════╪══════════╡
# │ http://vote.com/ballon_dor?can… ┆ {"messi","python"}    ┆ MESSI    │
# │ http://vote.com/ballon_dor?can… ┆ {"weghorst","polars"} ┆ WEGHORST │
# │ http://vote.com/ballon_dor?err… ┆ {null,null}           ┆ null     │
# └─────────────────────────────────┴───────────────────────┴──────────┘

Parameters:

  • pattern (String)

    A valid regular expression pattern containing at least one capture group, compatible with the regex crate.

Returns:



1174
1175
1176
# File 'lib/polars/string_expr.rb', line 1174

def extract_groups(pattern)
  Utils.wrap_expr(_rbexpr.str_extract_groups(pattern))
end

#extract_many(patterns, ascii_case_insensitive: false, overlapping: false) ⇒ Expr

Note:

This method supports matching on string literals only, and does not support regular expression matching.

Use the Aho-Corasick algorithm to extract many matches.

Examples:

df = Polars::DataFrame.new({"values" => ["discontent"]})
patterns = ["winter", "disco", "onte", "discontent"]
df.with_columns(
  Polars.col("values")
  .str.extract_many(patterns, overlapping: false)
  .alias("matches"),
  Polars.col("values")
  .str.extract_many(patterns, overlapping: true)
  .alias("matches_overlapping"),
)
# =>
# shape: (1, 3)
# ┌────────────┬───────────┬─────────────────────────────────┐
# │ values     ┆ matches   ┆ matches_overlapping             │
# │ ---        ┆ ---       ┆ ---                             │
# │ str        ┆ list[str] ┆ list[str]                       │
# ╞════════════╪═══════════╪═════════════════════════════════╡
# │ discontent ┆ ["disco"] ┆ ["disco", "onte", "discontent"… │
# └────────────┴───────────┴─────────────────────────────────┘
df = Polars::DataFrame.new(
  {
    "values" => ["discontent", "rhapsody"],
    "patterns" => [
      ["winter", "disco", "onte", "discontent"],
      ["rhap", "ody", "coalesce"]
    ]
  }
)
df.select(Polars.col("values").str.extract_many("patterns"))
# =>
# shape: (2, 1)
# ┌─────────────────┐
# │ values          │
# │ ---             │
# │ list[str]       │
# ╞═════════════════╡
# │ ["disco"]       │
# │ ["rhap", "ody"] │
# └─────────────────┘

Parameters:

  • patterns (Object)

    String patterns to search.

  • ascii_case_insensitive (Boolean) (defaults to: false)

    Enable ASCII-aware case-insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only.

  • overlapping (Boolean) (defaults to: false)

    Whether matches may overlap.

Returns:



1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
# File 'lib/polars/string_expr.rb', line 1829

def extract_many(
  patterns,
  ascii_case_insensitive: false,
  overlapping: false
)
  patterns = Utils.parse_into_expression(patterns, str_as_lit: false)
  Utils.wrap_expr(
    _rbexpr.str_extract_many(patterns, ascii_case_insensitive, overlapping)
  )
end

#find(pattern, literal: false, strict: true) ⇒ Expr

Note:

To modify regular expression behaviour (such as case-sensitivity) with flags, use the inline (?iLmsuxU) syntax.

Return the bytes offset of the first substring matching a pattern.

If the pattern is not found, returns nil.

Examples:

Find the index of the first substring matching a regex or literal pattern:

df = Polars::DataFrame.new(
  {
    "txt" => ["Crab", "Lobster", nil, "Crustacean"],
    "pat" => ["a[bc]", "b.t", "[aeiuo]", "(?i)A[BC]"]
  }
)
df.select(
  Polars.col("txt"),
  Polars.col("txt").str.find("a|e").alias("a|e (regex)"),
  Polars.col("txt").str.find("e", literal: true).alias("e (lit)"),
)
# =>
# shape: (4, 3)
# ┌────────────┬─────────────┬─────────┐
# │ txt        ┆ a|e (regex) ┆ e (lit) │
# │ ---        ┆ ---         ┆ ---     │
# │ str        ┆ u32         ┆ u32     │
# ╞════════════╪═════════════╪═════════╡
# │ Crab       ┆ 2           ┆ null    │
# │ Lobster    ┆ 5           ┆ 5       │
# │ null       ┆ null        ┆ null    │
# │ Crustacean ┆ 5           ┆ 7       │
# └────────────┴─────────────┴─────────┘

Match against a pattern found in another column or (expression):

df.with_columns(Polars.col("txt").str.find(Polars.col("pat")).alias("find_pat"))
# =>
# shape: (4, 3)
# ┌────────────┬───────────┬──────────┐
# │ txt        ┆ pat       ┆ find_pat │
# │ ---        ┆ ---       ┆ ---      │
# │ str        ┆ str       ┆ u32      │
# ╞════════════╪═══════════╪══════════╡
# │ Crab       ┆ a[bc]     ┆ 2        │
# │ Lobster    ┆ b.t       ┆ 2        │
# │ null       ┆ [aeiuo]   ┆ null     │
# │ Crustacean ┆ (?i)A[BC] ┆ 5        │
# └────────────┴───────────┴──────────┘

Parameters:

  • pattern (String)

    A valid regular expression pattern, compatible with the regex crate.

  • literal (Boolean) (defaults to: false)

    Treat pattern as a literal string, not as a regular expression.

  • strict (Boolean) (defaults to: true)

    Raise an error if the underlying pattern is not a valid regex, otherwise mask out with a null value.

Returns:



830
831
832
833
# File 'lib/polars/string_expr.rb', line 830

def find(pattern, literal: false, strict: true)
  pattern = Utils.parse_into_expression(pattern, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_find(pattern, literal, strict))
end

#find_many(patterns, ascii_case_insensitive: false, overlapping: false) ⇒ Expr

Note:

This method supports matching on string literals only, and does not support regular expression matching.

Use the Aho-Corasick algorithm to find many matches.

The function will return the bytes offset of the start of each match. The return type will be List<UInt32>

Examples:

df = Polars::DataFrame.new({"values" => ["discontent"]})
patterns = ["winter", "disco", "onte", "discontent"]
df.with_columns(
  Polars.col("values")
  .str.extract_many(patterns, overlapping: false)
  .alias("matches"),
  Polars.col("values")
  .str.extract_many(patterns, overlapping: true)
  .alias("matches_overlapping"),
)
# =>
# shape: (1, 3)
# ┌────────────┬───────────┬─────────────────────────────────┐
# │ values     ┆ matches   ┆ matches_overlapping             │
# │ ---        ┆ ---       ┆ ---                             │
# │ str        ┆ list[str] ┆ list[str]                       │
# ╞════════════╪═══════════╪═════════════════════════════════╡
# │ discontent ┆ ["disco"] ┆ ["disco", "onte", "discontent"… │
# └────────────┴───────────┴─────────────────────────────────┘
df = Polars::DataFrame.new(
  {
    "values" => ["discontent", "rhapsody"],
    "patterns" => [
      ["winter", "disco", "onte", "discontent"],
      ["rhap", "ody", "coalesce"]
    ]
  }
)
df.select(Polars.col("values").str.find_many("patterns"))
# =>
# shape: (2, 1)
# ┌───────────┐
# │ values    │
# │ ---       │
# │ list[u32] │
# ╞═══════════╡
# │ [0]       │
# │ [0, 5]    │
# └───────────┘

Parameters:

  • patterns (Object)

    String patterns to search.

  • ascii_case_insensitive (Boolean) (defaults to: false)

    Enable ASCII-aware case-insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only.

  • overlapping (Boolean) (defaults to: false)

    Whether matches may overlap.

Returns:



1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
# File 'lib/polars/string_expr.rb', line 1902

def find_many(
  patterns,
  ascii_case_insensitive: false,
  overlapping: false
)
  patterns = Utils.parse_into_expression(patterns, str_as_lit: false)
  Utils.wrap_expr(
    _rbexpr.str_find_many(patterns, ascii_case_insensitive, overlapping)
  )
end

#head(n) ⇒ Expr

Note:

1) The n input is defined in terms of the number of characters in the (UTF8) string. A character is defined as a Unicode scalar value. A single character is represented by a single byte when working with ASCII text, and a maximum of 4 bytes otherwise.

2) When the n input is negative, head returns characters up to the nth from the end of the string. For example, if n = -3, then all characters except the last three are returned.

3) If the length of the string has fewer than n characters, the full string is returned.

Return the first n characters of each string in a String Series.

Examples:

Return up to the first 5 characters:

df = Polars::DataFrame.new({"s" => ["pear", nil, "papaya", "dragonfruit"]})
df.with_columns(Polars.col("s").str.head(5).alias("s_head_5"))
# =>
# shape: (4, 2)
# ┌─────────────┬──────────┐
# │ s           ┆ s_head_5 │
# │ ---         ┆ ---      │
# │ str         ┆ str      │
# ╞═════════════╪══════════╡
# │ pear        ┆ pear     │
# │ null        ┆ null     │
# │ papaya      ┆ papay    │
# │ dragonfruit ┆ drago    │
# └─────────────┴──────────┘

Return characters determined by column n:

df = Polars::DataFrame.new(
  {
    "s" => ["pear", nil, "papaya", "dragonfruit"],
    "n" => [3, 4, -2, -5]
  }
)
df.with_columns(Polars.col("s").str.head("n").alias("s_head_n"))
# =>
# shape: (4, 3)
# ┌─────────────┬─────┬──────────┐
# │ s           ┆ n   ┆ s_head_n │
# │ ---         ┆ --- ┆ ---      │
# │ str         ┆ i64 ┆ str      │
# ╞═════════════╪═════╪══════════╡
# │ pear        ┆ 3   ┆ pea      │
# │ null        ┆ 4   ┆ null     │
# │ papaya      ┆ -2  ┆ papa     │
# │ dragonfruit ┆ -5  ┆ dragon   │
# └─────────────┴─────┴──────────┘

Parameters:

  • n (Integer)

    Length of the slice (integer or expression). Negative indexing is supported; see note (2) below.

Returns:



1493
1494
1495
1496
# File 'lib/polars/string_expr.rb', line 1493

def head(n)
  n = Utils.parse_into_expression(n)
  Utils.wrap_expr(_rbexpr.str_head(n))
end

#join(delimiter = "-", ignore_nulls: true) ⇒ Expr Also known as: concat

Vertically concat the values in the Series to a single string value.

Examples:

df = Polars::DataFrame.new({"foo" => [1, nil, 2]})
df.select(Polars.col("foo").str.join("-"))
# =>
# shape: (1, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ 1-2 │
# └─────┘
df = Polars::DataFrame.new({"foo" => [1, nil, 2]})
df.select(Polars.col("foo").str.join("-", ignore_nulls: false))
# =>
# shape: (1, 1)
# ┌──────┐
# │ foo  │
# │ ---  │
# │ str  │
# ╞══════╡
# │ null │
# └──────┘

Parameters:

  • delimiter (String) (defaults to: "-")

    The delimiter to insert between consecutive string values.

  • ignore_nulls (Boolean) (defaults to: true)

    Ignore null values (default).

Returns:



364
365
366
# File 'lib/polars/string_expr.rb', line 364

def join(delimiter = "-", ignore_nulls: true)
  Utils.wrap_expr(_rbexpr.str_join(delimiter, ignore_nulls))
end

#json_decode(dtype, infer_schema_length: nil) ⇒ Expr Also known as: json_extract

Parse string values as JSON.

Throw errors if encounter invalid JSON strings.

Examples:

df = Polars::DataFrame.new(
  {"json" => ['{"a":1, "b": true}', nil, '{"a":2, "b": false}']}
)
dtype = Polars::Struct.new([Polars::Field.new("a", Polars::Int64), Polars::Field.new("b", Polars::Boolean)])
df.with_columns(decoded: Polars.col("json").str.json_decode(dtype))
# =>
# shape: (3, 2)
# ┌─────────────────────┬───────────┐
# │ json                ┆ decoded   │
# │ ---                 ┆ ---       │
# │ str                 ┆ struct[2] │
# ╞═════════════════════╪═══════════╡
# │ {"a":1, "b": true}  ┆ {1,true}  │
# │ null                ┆ null      │
# │ {"a":2, "b": false} ┆ {2,false} │
# └─────────────────────┴───────────┘

Parameters:

  • dtype (Object)

    The dtype to cast the extracted value to.

  • infer_schema_length (Integer) (defaults to: nil)

    Deprecated and ignored.

Returns:



943
944
945
946
947
948
949
950
951
# File 'lib/polars/string_expr.rb', line 943

def json_decode(dtype, infer_schema_length: nil)
  if dtype.nil?
    msg = "`Expr.str.json_decode` needs an explicitly given `dtype` otherwise Polars is not able to determine the output type. If you want to eagerly infer datatype you can use `Series.str.json_decode`."
    raise TypeError, msg
  end

  dtype_expr = Utils.parse_into_datatype_expr(dtype)._rbdatatype_expr
  Utils.wrap_expr(_rbexpr.str_json_decode(dtype_expr))
end

#json_path_match(json_path) ⇒ Expr

Extract the first match of json string with provided JSONPath expression.

Throw errors if encounter invalid json strings. All return value will be casted to Utf8 regardless of the original value.

Documentation on JSONPath standard can be found here.

Examples:

df = Polars::DataFrame.new(
  {"json_val" => ['{"a":"1"}', nil, '{"a":2}', '{"a":2.1}', '{"a":true}']}
)
df.select(Polars.col("json_val").str.json_path_match("$.a"))
# =>
# shape: (5, 1)
# ┌──────────┐
# │ json_val │
# │ ---      │
# │ str      │
# ╞══════════╡
# │ 1        │
# │ null     │
# │ 2        │
# │ 2.1      │
# │ true     │
# └──────────┘

Parameters:

  • json_path (String)

    A valid JSON path query string.

Returns:



985
986
987
988
# File 'lib/polars/string_expr.rb', line 985

def json_path_match(json_path)
  json_path = Utils.parse_into_expression(json_path, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_json_path_match(json_path))
end

#len_bytesExpr Also known as: lengths

Note:

The returned lengths are equal to the number of bytes in the UTF8 string. If you need the length in terms of the number of characters, use n_chars instead.

Get length of the strings as :u32 (as number of bytes).

Examples:

df = Polars::DataFrame.new({"s" => ["Café", nil, "345", "東京"]}).with_columns(
  [
    Polars.col("s").str.len_bytes.alias("length"),
    Polars.col("s").str.len_chars.alias("nchars")
  ]
)
df
# =>
# shape: (4, 3)
# ┌──────┬────────┬────────┐
# │ s    ┆ length ┆ nchars │
# │ ---  ┆ ---    ┆ ---    │
# │ str  ┆ u32    ┆ u32    │
# ╞══════╪════════╪════════╡
# │ Café ┆ 5      ┆ 4      │
# │ null ┆ null   ┆ null   │
# │ 345  ┆ 3      ┆ 3      │
# │ 東京 ┆ 6      ┆ 2      │
# └──────┴────────┴────────┘

Returns:



292
293
294
# File 'lib/polars/string_expr.rb', line 292

def len_bytes
  Utils.wrap_expr(_rbexpr.str_len_bytes)
end

#len_charsExpr Also known as: n_chars

Note:

If you know that you are working with ASCII text, lengths will be equivalent, and faster (returns length in terms of the number of bytes).

Get length of the strings as :u32 (as number of chars).

Examples:

df = Polars::DataFrame.new({"s" => ["Café", nil, "345", "東京"]}).with_columns(
  [
    Polars.col("s").str.len_bytes.alias("length"),
    Polars.col("s").str.len_chars.alias("nchars")
  ]
)
df
# =>
# shape: (4, 3)
# ┌──────┬────────┬────────┐
# │ s    ┆ length ┆ nchars │
# │ ---  ┆ ---    ┆ ---    │
# │ str  ┆ u32    ┆ u32    │
# ╞══════╪════════╪════════╡
# │ Café ┆ 5      ┆ 4      │
# │ null ┆ null   ┆ null   │
# │ 345  ┆ 3      ┆ 3      │
# │ 東京 ┆ 6      ┆ 2      │
# └──────┴────────┴────────┘

Returns:



325
326
327
# File 'lib/polars/string_expr.rb', line 325

def len_chars
  Utils.wrap_expr(_rbexpr.str_len_chars)
end

#normalize(form = "NFC") ⇒ Expr

Returns the Unicode normal form of the string values.

This uses the forms described in Unicode Standard Annex 15: https://www.unicode.org/reports/tr15/.

Examples:

df = Polars::DataFrame.new({"text" => ["01²", "KADOKAWA"]})
new = df.with_columns(
  nfc: Polars.col("text").str.normalize("NFC"),
  nfkc: Polars.col("text").str.normalize("NFKC")
)
# =>
# shape: (2, 3)
# ┌──────────────────┬──────────────────┬──────────┐
# │ text             ┆ nfc              ┆ nfkc     │
# │ ---              ┆ ---              ┆ ---      │
# │ str              ┆ str              ┆ str      │
# ╞══════════════════╪══════════════════╪══════════╡
# │ 01²              ┆ 01²              ┆ 012      │
# │ KADOKAWA ┆ KADOKAWA ┆ KADOKAWA │
# └──────────────────┴──────────────────┴──────────┘
new.select(Polars.all.str.len_bytes)
# =>
# shape: (2, 3)
# ┌──────┬─────┬──────┐
# │ text ┆ nfc ┆ nfkc │
# │ ---  ┆ --- ┆ ---  │
# │ u32  ┆ u32 ┆ u32  │
# ╞══════╪═════╪══════╡
# │ 4    ┆ 4   ┆ 3    │
# │ 24   ┆ 24  ┆ 8    │
# └──────┴─────┴──────┘

Parameters:

  • form ('NFC', 'NFKC', 'NFD', 'NFKD') (defaults to: "NFC")

    Unicode form to use.

Returns:



430
431
432
# File 'lib/polars/string_expr.rb', line 430

def normalize(form = "NFC")
  Utils.wrap_expr(_rbexpr.str_normalize(form))
end

#pad_end(length, fill_char = " ") ⇒ Expr Also known as: ljust

Pad the end of the string until it reaches the given length.

Examples:

df = Polars::DataFrame.new({"a": ["cow", "monkey", "hippopotamus", nil]})
df.with_columns(padded: Polars.col("a").str.pad_end(8, "*"))
# =>
# shape: (4, 2)
# ┌──────────────┬──────────────┐
# │ a            ┆ padded       │
# │ ---          ┆ ---          │
# │ str          ┆ str          │
# ╞══════════════╪══════════════╡
# │ cow          ┆ cow*****     │
# │ monkey       ┆ monkey**     │
# │ hippopotamus ┆ hippopotamus │
# │ null         ┆ null         │
# └──────────────┴──────────────┘

Parameters:

  • length (Integer)

    Pad the string until it reaches this length. Strings with length equal to or greater than this value are returned as-is.

  • fill_char (String) (defaults to: " ")

    The character to pad the string with.

Returns:



695
696
697
698
# File 'lib/polars/string_expr.rb', line 695

def pad_end(length, fill_char = " ")
  length = Utils.parse_into_expression(length)
  Utils.wrap_expr(_rbexpr.str_pad_end(length, fill_char))
end

#pad_start(length, fill_char = " ") ⇒ Expr Also known as: rjust

Pad the start of the string until it reaches the given length.

Examples:

df = Polars::DataFrame.new({"a": ["cow", "monkey", "hippopotamus", nil]})
df.with_columns(padded: Polars.col("a").str.pad_start(8, "*"))
# =>
# shape: (4, 2)
# ┌──────────────┬──────────────┐
# │ a            ┆ padded       │
# │ ---          ┆ ---          │
# │ str          ┆ str          │
# ╞══════════════╪══════════════╡
# │ cow          ┆ *****cow     │
# │ monkey       ┆ **monkey     │
# │ hippopotamus ┆ hippopotamus │
# │ null         ┆ null         │
# └──────────────┴──────────────┘

Parameters:

  • length (Integer)

    Pad the string until it reaches this length. Strings with length equal to or greater than this value are returned as-is.

  • fill_char (String) (defaults to: " ")

    The character to pad the string with.

Returns:



664
665
666
667
# File 'lib/polars/string_expr.rb', line 664

def pad_start(length, fill_char = " ")
  length = Utils.parse_into_expression(length)
  Utils.wrap_expr(_rbexpr.str_pad_start(length, fill_char))
end

#parse_int(radix = 2, strict: true) ⇒ Expr

Parse integers with base radix from strings.

By default base 2. ParseError/Overflows become Nulls.

Examples:

df = Polars::DataFrame.new({"bin" => ["110", "101", "010", "invalid"]})
df.select(Polars.col("bin").str.parse_int(2, strict: false))
# =>
# shape: (4, 1)
# ┌──────┐
# │ bin  │
# │ ---  │
# │ i32  │
# ╞══════╡
# │ 6    │
# │ 5    │
# │ 2    │
# │ null │
# └──────┘

Parameters:

  • radix (Integer) (defaults to: 2)

    Positive integer which is the base of the string we are parsing. Default: 2.

  • strict (Boolean) (defaults to: true)

    Bool, Default=true will raise any ParseError or overflow as ComputeError. False silently convert to Null.

Returns:



1635
1636
1637
# File 'lib/polars/string_expr.rb', line 1635

def parse_int(radix = 2, strict: true)
  to_integer(base: 2, strict: strict).cast(Int32, strict: strict)
end

#replace(pattern, value, literal: false, n: 1) ⇒ Expr

Replace first matching regex/literal substring with a new string value.

Examples:

df = Polars::DataFrame.new({"id" => [1, 2], "text" => ["123abc", "abc456"]})
df.with_column(
  Polars.col("text").str.replace('abc\b', "ABC")
)
# =>
# shape: (2, 2)
# ┌─────┬────────┐
# │ id  ┆ text   │
# │ --- ┆ ---    │
# │ i64 ┆ str    │
# ╞═════╪════════╡
# │ 1   ┆ 123ABC │
# │ 2   ┆ abc456 │
# └─────┴────────┘

Parameters:

  • pattern (String)

    Regex pattern.

  • value (String)

    Replacement string.

  • literal (Boolean) (defaults to: false)

    Treat pattern as a literal string.

  • n (Integer) (defaults to: 1)

    Number of matches to replace.

Returns:



1345
1346
1347
1348
1349
# File 'lib/polars/string_expr.rb', line 1345

def replace(pattern, value, literal: false, n: 1)
  pattern = Utils.parse_into_expression(pattern, str_as_lit: true)
  value = Utils.parse_into_expression(value, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_replace_n(pattern, value, literal, n))
end

#replace_all(pattern, value, literal: false) ⇒ Expr

Replace all matching regex/literal substrings with a new string value.

Examples:

df = Polars::DataFrame.new({"id" => [1, 2], "text" => ["abcabc", "123a123"]})
df.with_column(Polars.col("text").str.replace_all("a", "-"))
# =>
# shape: (2, 2)
# ┌─────┬─────────┐
# │ id  ┆ text    │
# │ --- ┆ ---     │
# │ i64 ┆ str     │
# ╞═════╪═════════╡
# │ 1   ┆ -bc-bc  │
# │ 2   ┆ 123-123 │
# └─────┴─────────┘

Parameters:

  • pattern (String)

    Regex pattern.

  • value (String)

    Replacement string.

  • literal (Boolean) (defaults to: false)

    Treat pattern as a literal string.

Returns:



1375
1376
1377
1378
1379
# File 'lib/polars/string_expr.rb', line 1375

def replace_all(pattern, value, literal: false)
  pattern = Utils.parse_into_expression(pattern, str_as_lit: true)
  value = Utils.parse_into_expression(value, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_replace_all(pattern, value, literal))
end

#replace_many(patterns, replace_with = Expr::NO_DEFAULT, ascii_case_insensitive: false) ⇒ Expr

Use the aho-corasick algorithm to replace many matches.

Examples:

df = Polars::DataFrame.new(
  {
    "lyrics": [
      "Everybody wants to rule the world",
      "Tell me what you want, what you really really want",
      "Can you feel the love tonight"
    ]
  }
)
df.with_columns(
  Polars.col("lyrics")
  .str.replace_many(
    ["me", "you", "they"],
    [""]
  )
  .alias("removes_pronouns")
)
# =>
# shape: (3, 2)
# ┌─────────────────────────────────┬─────────────────────────────────┐
# │ lyrics                          ┆ removes_pronouns                │
# │ ---                             ┆ ---                             │
# │ str                             ┆ str                             │
# ╞═════════════════════════════════╪═════════════════════════════════╡
# │ Everybody wants to rule the wo… ┆ Everybody wants to rule the wo… │
# │ Tell me what you want, what yo… ┆ Tell  what  want, what  really… │
# │ Can you feel the love tonight   ┆ Can  feel the love tonight      │
# └─────────────────────────────────┴─────────────────────────────────┘
df.with_columns(
  Polars.col("lyrics")
  .str.replace_many(
    ["me", "you"],
    ["you", "me"]
  )
  .alias("confusing")
)
# =>
# shape: (3, 2)
# ┌─────────────────────────────────┬─────────────────────────────────┐
# │ lyrics                          ┆ confusing                       │
# │ ---                             ┆ ---                             │
# │ str                             ┆ str                             │
# ╞═════════════════════════════════╪═════════════════════════════════╡
# │ Everybody wants to rule the wo… ┆ Everybody wants to rule the wo… │
# │ Tell me what you want, what yo… ┆ Tell you what me want, what me… │
# │ Can you feel the love tonight   ┆ Can me feel the love tonight    │
# └─────────────────────────────────┴─────────────────────────────────┘

Parameters:

  • patterns (Object)

    String patterns to search and replace.

  • replace_with (Object) (defaults to: Expr::NO_DEFAULT)

    Strings to replace where a pattern was a match. This can be broadcasted. So it supports many:one and many:many.

  • ascii_case_insensitive (Boolean) (defaults to: false)

    Enable ASCII-aware case insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only.

Returns:



1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
# File 'lib/polars/string_expr.rb', line 1747

def replace_many(patterns, replace_with = Expr::NO_DEFAULT, ascii_case_insensitive: false)
  if replace_with == Expr::NO_DEFAULT
    if !patterns.is_a?(Hash)
      msg = "`replace_with` argument is required if `patterns` argument is not a Hash type"
      raise TypeError, msg
    end
    # Early return in case of an empty mapping.
    if patterns.empty?
      return Utils.wrap_expr(_rbexpr)
    end
    replace_with = patterns.values
    patterns = patterns.keys
  end

  patterns = Utils.parse_into_expression(patterns, str_as_lit: false)
  replace_with = Utils.parse_into_expression(replace_with, str_as_lit: true)
  Utils.wrap_expr(
    _rbexpr.str_replace_many(
      patterns, replace_with, ascii_case_insensitive
    )
  )
end

#reverseExpr

Returns string values in reversed order.

Examples:

df = Polars::DataFrame.new({"text" => ["foo", "bar", "man\u0303ana"]})
df.with_columns(Polars.col("text").str.reverse.alias("reversed"))
# =>
# shape: (3, 2)
# ┌────────┬──────────┐
# │ text   ┆ reversed │
# │ ---    ┆ ---      │
# │ str    ┆ str      │
# ╞════════╪══════════╡
# │ foo    ┆ oof      │
# │ bar    ┆ rab      │
# │ mañana ┆ anañam   │
# └────────┴──────────┘

Returns:



1399
1400
1401
# File 'lib/polars/string_expr.rb', line 1399

def reverse
  Utils.wrap_expr(_rbexpr.str_reverse)
end

#slice(offset, length = nil) ⇒ Expr

Create subslices of the string values of a Utf8 Series.

Examples:

df = Polars::DataFrame.new({"s" => ["pear", nil, "papaya", "dragonfruit"]})
df.with_column(
  Polars.col("s").str.slice(-3).alias("s_sliced")
)
# =>
# shape: (4, 2)
# ┌─────────────┬──────────┐
# │ s           ┆ s_sliced │
# │ ---         ┆ ---      │
# │ str         ┆ str      │
# ╞═════════════╪══════════╡
# │ pear        ┆ ear      │
# │ null        ┆ null     │
# │ papaya      ┆ aya      │
# │ dragonfruit ┆ uit      │
# └─────────────┴──────────┘

Parameters:

  • offset (Integer)

    Start index. Negative indexing is supported.

  • length (Integer) (defaults to: nil)

    Length of the slice. If set to nil (default), the slice is taken to the end of the string.

Returns:



1430
1431
1432
1433
1434
# File 'lib/polars/string_expr.rb', line 1430

def slice(offset, length = nil)
  offset = Utils.parse_into_expression(offset)
  length = Utils.parse_into_expression(length)
  Utils.wrap_expr(_rbexpr.str_slice(offset, length))
end

#split(by, inclusive: false) ⇒ Expr

Split the string by a substring.

Examples:

df = Polars::DataFrame.new({"s" => ["foo bar", "foo-bar", "foo bar baz"]})
df.select(Polars.col("s").str.split(" "))
# =>
# shape: (3, 1)
# ┌───────────────────────┐
# │ s                     │
# │ ---                   │
# │ list[str]             │
# ╞═══════════════════════╡
# │ ["foo", "bar"]        │
# │ ["foo-bar"]           │
# │ ["foo", "bar", "baz"] │
# └───────────────────────┘

Parameters:

  • by (String)

    Substring to split by.

  • inclusive (Boolean) (defaults to: false)

    If true, include the split character/string in the results.

Returns:



1233
1234
1235
1236
1237
1238
1239
# File 'lib/polars/string_expr.rb', line 1233

def split(by, inclusive: false)
  by = Utils.parse_into_expression(by, str_as_lit: true)
  if inclusive
    return Utils.wrap_expr(_rbexpr.str_split_inclusive(by))
  end
  Utils.wrap_expr(_rbexpr.str_split(by))
end

#split_exact(by, n, inclusive: false) ⇒ Expr

Split the string by a substring using n splits.

Results in a struct of n+1 fields.

If it cannot make n splits, the remaining field elements will be null.

Examples:

df = Polars::DataFrame.new({"x" => ["a_1", nil, "c", "d_4"]})
df.select(
  [
    Polars.col("x").str.split_exact("_", 1).alias("fields")
  ]
)
# =>
# shape: (4, 1)
# ┌─────────────┐
# │ fields      │
# │ ---         │
# │ struct[2]   │
# ╞═════════════╡
# │ {"a","1"}   │
# │ {null,null} │
# │ {"c",null}  │
# │ {"d","4"}   │
# └─────────────┘

Parameters:

  • by (String)

    Substring to split by.

  • n (Integer)

    Number of splits to make.

  • inclusive (Boolean) (defaults to: false)

    If true, include the split character/string in the results.

Returns:



1275
1276
1277
1278
1279
1280
1281
1282
# File 'lib/polars/string_expr.rb', line 1275

def split_exact(by, n, inclusive: false)
  by = Utils.parse_into_expression(by, str_as_lit: true)
  if inclusive
    Utils.wrap_expr(_rbexpr.str_split_exact_inclusive(by, n))
  else
    Utils.wrap_expr(_rbexpr.str_split_exact(by, n))
  end
end

#splitn(by, n) ⇒ Expr

Split the string by a substring, restricted to returning at most n items.

If the number of possible splits is less than n-1, the remaining field elements will be null. If the number of possible splits is n-1 or greater, the last (nth) substring will contain the remainder of the string.

Examples:

df = Polars::DataFrame.new({"s" => ["foo bar", nil, "foo-bar", "foo bar baz"]})
df.select(Polars.col("s").str.splitn(" ", 2).alias("fields"))
# =>
# shape: (4, 1)
# ┌───────────────────┐
# │ fields            │
# │ ---               │
# │ struct[2]         │
# ╞═══════════════════╡
# │ {"foo","bar"}     │
# │ {null,null}       │
# │ {"foo-bar",null}  │
# │ {"foo","bar baz"} │
# └───────────────────┘

Parameters:

  • by (String)

    Substring to split by.

  • n (Integer)

    Max number of items to return.

Returns:



1312
1313
1314
1315
# File 'lib/polars/string_expr.rb', line 1312

def splitn(by, n)
  by = Utils.parse_into_expression(by, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_splitn(by, n))
end

#starts_with(sub) ⇒ Expr

Check if string values start with a substring.

Examples:

df = Polars::DataFrame.new({"fruits" => ["apple", "mango", nil]})
df.with_column(
  Polars.col("fruits").str.starts_with("app").alias("has_prefix")
)
# =>
# shape: (3, 2)
# ┌────────┬────────────┐
# │ fruits ┆ has_prefix │
# │ ---    ┆ ---        │
# │ str    ┆ bool       │
# ╞════════╪════════════╡
# │ apple  ┆ true       │
# │ mango  ┆ false      │
# │ null   ┆ null       │
# └────────┴────────────┘

Using starts_with as a filter condition:

df.filter(Polars.col("fruits").str.starts_with("app"))
# =>
# shape: (1, 1)
# ┌────────┐
# │ fruits │
# │ ---    │
# │ str    │
# ╞════════╡
# │ apple  │
# └────────┘

Parameters:

  • sub (String)

    Prefix substring.

Returns:



910
911
912
913
# File 'lib/polars/string_expr.rb', line 910

def starts_with(sub)
  sub = Utils.parse_into_expression(sub, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_starts_with(sub))
end

#strip_chars(characters = nil) ⇒ Expr Also known as: strip

Remove leading and trailing whitespace.

Examples:

df = Polars::DataFrame.new({"foo" => [" lead", "trail ", " both "]})
df.select(Polars.col("foo").str.strip)
# =>
# shape: (3, 1)
# ┌───────┐
# │ foo   │
# │ ---   │
# │ str   │
# ╞═══════╡
# │ lead  │
# │ trail │
# │ both  │
# └───────┘

Parameters:

  • characters (String, nil) (defaults to: nil)

    An optional single character that should be trimmed.

Returns:



520
521
522
523
# File 'lib/polars/string_expr.rb', line 520

def strip_chars(characters = nil)
  characters = Utils.parse_into_expression(characters, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_strip_chars(characters))
end

#strip_chars_end(characters = nil) ⇒ Expr Also known as: rstrip

Remove trailing whitespace.

Examples:

df = Polars::DataFrame.new({"foo" => [" lead", "trail ", " both "]})
df.select(Polars.col("foo").str.rstrip)
# =>
# shape: (3, 1)
# ┌───────┐
# │ foo   │
# │ ---   │
# │ str   │
# ╞═══════╡
# │  lead │
# │ trail │
# │  both │
# └───────┘

Parameters:

  • characters (String, nil) (defaults to: nil)

    An optional single character that should be trimmed.

Returns:



574
575
576
577
# File 'lib/polars/string_expr.rb', line 574

def strip_chars_end(characters = nil)
  characters = Utils.parse_into_expression(characters, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_strip_chars_end(characters))
end

#strip_chars_start(characters = nil) ⇒ Expr Also known as: lstrip

Remove leading whitespace.

Examples:

df = Polars::DataFrame.new({"foo" => [" lead", "trail ", " both "]})
df.select(Polars.col("foo").str.lstrip)
# =>
# shape: (3, 1)
# ┌────────┐
# │ foo    │
# │ ---    │
# │ str    │
# ╞════════╡
# │ lead   │
# │ trail  │
# │ both   │
# └────────┘

Parameters:

  • characters (String, nil) (defaults to: nil)

    An optional single character that should be trimmed.

Returns:



547
548
549
550
# File 'lib/polars/string_expr.rb', line 547

def strip_chars_start(characters = nil)
  characters = Utils.parse_into_expression(characters, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_strip_chars_start(characters))
end

#strip_prefix(prefix) ⇒ Expr

Remove prefix.

The prefix will be removed from the string exactly once, if found.

Examples:

df = Polars::DataFrame.new({"a" => ["foobar", "foofoobar", "foo", "bar"]})
df.with_columns(Polars.col("a").str.strip_prefix("foo").alias("stripped"))
# =>
# shape: (4, 2)
# ┌───────────┬──────────┐
# │ a         ┆ stripped │
# │ ---       ┆ ---      │
# │ str       ┆ str      │
# ╞═══════════╪══════════╡
# │ foobar    ┆ bar      │
# │ foofoobar ┆ foobar   │
# │ foo       ┆          │
# │ bar       ┆ bar      │
# └───────────┴──────────┘

Parameters:

  • prefix (String)

    The prefix to be removed.

Returns:



604
605
606
607
# File 'lib/polars/string_expr.rb', line 604

def strip_prefix(prefix)
  prefix = Utils.parse_into_expression(prefix, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_strip_prefix(prefix))
end

#strip_suffix(suffix) ⇒ Expr

Remove suffix.

The suffix will be removed from the string exactly once, if found.

Examples:

df = Polars::DataFrame.new({"a" => ["foobar", "foobarbar", "foo", "bar"]})
df.with_columns(Polars.col("a").str.strip_suffix("bar").alias("stripped"))
# =>
# shape: (4, 2)
# ┌───────────┬──────────┐
# │ a         ┆ stripped │
# │ ---       ┆ ---      │
# │ str       ┆ str      │
# ╞═══════════╪══════════╡
# │ foobar    ┆ foo      │
# │ foobarbar ┆ foobar   │
# │ foo       ┆ foo      │
# │ bar       ┆          │
# └───────────┴──────────┘

Parameters:

  • suffix (String)

    The suffix to be removed.

Returns:



634
635
636
637
# File 'lib/polars/string_expr.rb', line 634

def strip_suffix(suffix)
  suffix = Utils.parse_into_expression(suffix, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_strip_suffix(suffix))
end

#strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false) ⇒ Expr

Note:

When parsing a Datetime the column precision will be inferred from the format string, if given, eg: "%F %T%.3f" => Datetime("ms"). If no fractional second component is found then the default is "us".

Parse a Utf8 expression to a Date/Datetime/Time type.

Examples:

Dealing with a consistent format:

s = Polars::Series.new(["2020-01-01 01:00Z", "2020-01-01 02:00Z"])
s.str.strptime(Polars::Datetime, "%Y-%m-%d %H:%M%#z")
# =>
# shape: (2,)
# Series: '' [datetime[μs, UTC]]
# [
#         2020-01-01 01:00:00 UTC
#         2020-01-01 02:00:00 UTC
# ]

Dealing with different formats.

s = Polars::Series.new(
  "date",
  [
    "2021-04-22",
    "2022-01-04 00:00:00",
    "01/31/22",
    "Sun Jul  8 00:34:60 2001",
  ]
)
s.to_frame.select(
  Polars.coalesce(
    Polars.col("date").str.strptime(Polars::Date, "%F", strict: false),
    Polars.col("date").str.strptime(Polars::Date, "%F %T", strict: false),
    Polars.col("date").str.strptime(Polars::Date, "%D", strict: false),
    Polars.col("date").str.strptime(Polars::Date, "%c", strict: false)
  )
).to_series
# =>
# shape: (4,)
# Series: 'date' [date]
# [
#         2021-04-22
#         2022-01-04
#         2022-01-31
#         2001-07-08
# ]

Parameters:

  • dtype (Object)

    The data type to convert into. Can be either Date, Datetime, or Time.

  • format (String) (defaults to: nil)

    Format to use, refer to the chrono strftime documentation for specification. Example: "%y-%m-%d".

  • strict (Boolean) (defaults to: true)

    Raise an error if any conversion fails.

  • exact (Boolean) (defaults to: true)
    • If true, require an exact format match.
    • If false, allow the format to match anywhere in the target string.
  • cache (Boolean) (defaults to: true)

    Use a cache of unique, converted dates to apply the datetime conversion.

  • utc (Boolean) (defaults to: false)

    Parse timezone aware datetimes as UTC. This may be useful if you have data with mixed offsets.

Returns:



206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
# File 'lib/polars/string_expr.rb', line 206

def strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false)
  _validate_format_argument(format)

  if dtype == Date
    to_date(format, strict: strict, exact: exact, cache: cache)
  elsif dtype == Datetime || dtype.is_a?(Datetime)
    dtype = Datetime.new if dtype == Datetime
    time_unit = dtype.time_unit
    time_zone = dtype.time_zone
    to_datetime(format, time_unit: time_unit, time_zone: time_zone, strict: strict, exact: exact, cache: cache)
  elsif dtype == Time
    to_time(format, strict: strict, cache: cache)
  else
    raise ArgumentError, "dtype should be of type {Date, Datetime, Time}"
  end
end

#tail(n) ⇒ Expr

Note:

1) The n input is defined in terms of the number of characters in the (UTF8) string. A character is defined as a Unicode scalar value. A single character is represented by a single byte when working with ASCII text, and a maximum of 4 bytes otherwise.

2) When the n input is negative, tail returns characters starting from the nth from the beginning of the string. For example, if n = -3, then all characters except the first three are returned.

3) If the length of the string has fewer than n characters, the full string is returned.

Return the last n characters of each string in a String Series.

Examples:

Return up to the last 5 characters:

df = Polars::DataFrame.new({"s" => ["pear", nil, "papaya", "dragonfruit"]})
df.with_columns(Polars.col("s").str.tail(5).alias("s_tail_5"))
# =>
# shape: (4, 2)
# ┌─────────────┬──────────┐
# │ s           ┆ s_tail_5 │
# │ ---         ┆ ---      │
# │ str         ┆ str      │
# ╞═════════════╪══════════╡
# │ pear        ┆ pear     │
# │ null        ┆ null     │
# │ papaya      ┆ apaya    │
# │ dragonfruit ┆ fruit    │
# └─────────────┴──────────┘

Return characters determined by column n:

df = Polars::DataFrame.new(
  {
    "s" => ["pear", nil, "papaya", "dragonfruit"],
    "n" => [3, 4, -2, -5]
  }
)
df.with_columns(Polars.col("s").str.tail("n").alias("s_tail_n"))
# =>
# shape: (4, 3)
# ┌─────────────┬─────┬──────────┐
# │ s           ┆ n   ┆ s_tail_n │
# │ ---         ┆ --- ┆ ---      │
# │ str         ┆ i64 ┆ str      │
# ╞═════════════╪═════╪══════════╡
# │ pear        ┆ 3   ┆ ear      │
# │ null        ┆ 4   ┆ null     │
# │ papaya      ┆ -2  ┆ paya     │
# │ dragonfruit ┆ -5  ┆ nfruit   │
# └─────────────┴─────┴──────────┘

Parameters:

  • n (Integer)

    Length of the slice (integer or expression). Negative indexing is supported; see note (2) below.

Returns:



1555
1556
1557
1558
# File 'lib/polars/string_expr.rb', line 1555

def tail(n)
  n = Utils.parse_into_expression(n)
  Utils.wrap_expr(_rbexpr.str_tail(n))
end

#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ Expr

Convert a Utf8 column into a Date column.

Examples:

s = Polars::Series.new(["2020/01/01", "2020/02/01", "2020/03/01"])
s.str.to_date
# =>
# shape: (3,)
# Series: '' [date]
# [
#         2020-01-01
#         2020-02-01
#         2020-03-01
# ]

Parameters:

  • format (String) (defaults to: nil)

    Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%Y-%m-%d". If set to nil (default), the format is inferred from the data.

  • strict (Boolean) (defaults to: true)

    Raise an error if any conversion fails.

  • exact (Boolean) (defaults to: true)

    Require an exact format match. If false, allow the format to match anywhere in the target string.

  • cache (Boolean) (defaults to: true)

    Use a cache of unique, converted dates to apply the conversion.

Returns:



40
41
42
43
# File 'lib/polars/string_expr.rb', line 40

def to_date(format = nil, strict: true, exact: true, cache: true)
  _validate_format_argument(format)
  Utils.wrap_expr(_rbexpr.str_to_date(format, strict, exact, cache))
end

#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, ambiguous: "raise") ⇒ Expr

Convert a Utf8 column into a Datetime column.

Examples:

s = Polars::Series.new(["2020-01-01 01:00Z", "2020-01-01 02:00Z"])
s.str.to_datetime("%Y-%m-%d %H:%M%#z")
# =>
# shape: (2,)
# Series: '' [datetime[μs, UTC]]
# [
#         2020-01-01 01:00:00 UTC
#         2020-01-01 02:00:00 UTC
# ]

Parameters:

  • format (String) (defaults to: nil)

    Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%Y-%m-%d %H:%M:%S". If set to nil (default), the format is inferred from the data.

  • time_unit ("us", "ns", "ms") (defaults to: nil)

    Unit of time for the resulting Datetime column. If set to nil (default), the time unit is inferred from the format string if given, eg: "%F %T%.3f" => Datetime("ms"). If no fractional second component is found, the default is "us".

  • time_zone (String) (defaults to: nil)

    Time zone for the resulting Datetime column.

  • strict (Boolean) (defaults to: true)

    Raise an error if any conversion fails.

  • exact (Boolean) (defaults to: true)

    Require an exact format match. If false, allow the format to match anywhere in the target string.

  • cache (Boolean) (defaults to: true)

    Use a cache of unique, converted datetimes to apply the conversion.

  • ambiguous ('raise', 'earliest', 'latest', 'null') (defaults to: "raise")

    Determine how to deal with ambiguous datetimes:

    • 'raise' (default): raise
    • 'earliest': use the earliest datetime
    • 'latest': use the latest datetime
    • 'null': set to null

Returns:



86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
# File 'lib/polars/string_expr.rb', line 86

def to_datetime(
  format = nil,
  time_unit: nil,
  time_zone: nil,
  strict: true,
  exact: true,
  cache: true,
  ambiguous: "raise"
)
  _validate_format_argument(format)
  unless ambiguous.is_a?(Expr)
    ambiguous = Polars.lit(ambiguous)
  end
  Utils.wrap_expr(
    _rbexpr.str_to_datetime(
      format,
      time_unit,
      time_zone,
      strict,
      exact,
      cache,
      ambiguous._rbexpr
    )
  )
end

#to_decimal(scale:) ⇒ Expr

Convert a String column into a Decimal column.

Examples:

df = Polars::DataFrame.new(
  {
    "numbers": [
      "40.12",
      "3420.13",
      "120134.19",
      "3212.98",
      "12.90",
      "143.09",
      "143.9"
    ]
  }
)
df.with_columns(numbers_decimal: Polars.col("numbers").str.to_decimal(scale: 2))
# =>
# shape: (7, 2)
# ┌───────────┬─────────────────┐
# │ numbers   ┆ numbers_decimal │
# │ ---       ┆ ---             │
# │ str       ┆ decimal[38,2]   │
# ╞═══════════╪═════════════════╡
# │ 40.12     ┆ 40.12           │
# │ 3420.13   ┆ 3420.13         │
# │ 120134.19 ┆ 120134.19       │
# │ 3212.98   ┆ 3212.98         │
# │ 12.90     ┆ 12.90           │
# │ 143.09    ┆ 143.09          │
# │ 143.9     ┆ 143.90          │
# └───────────┴─────────────────┘

Parameters:

  • scale (Integer)

    Number of digits after the comma to use for the decimals.

Returns:



260
261
262
# File 'lib/polars/string_expr.rb', line 260

def to_decimal(scale:)
  Utils.wrap_expr(_rbexpr.str_to_decimal(scale))
end

#to_integer(base: 10, dtype: Int64, strict: true) ⇒ Expr

Convert an Utf8 column into an Int64 column with base radix.

Examples:

df = Polars::DataFrame.new({"bin" => ["110", "101", "010", "invalid"]})
df.with_columns(Polars.col("bin").str.to_integer(base: 2, strict: false).alias("parsed"))
# =>
# shape: (4, 2)
# ┌─────────┬────────┐
# │ bin     ┆ parsed │
# │ ---     ┆ ---    │
# │ str     ┆ i64    │
# ╞═════════╪════════╡
# │ 110     ┆ 6      │
# │ 101     ┆ 5      │
# │ 010     ┆ 2      │
# │ invalid ┆ null   │
# └─────────┴────────┘
df = Polars::DataFrame.new({"hex" => ["fa1e", "ff00", "cafe", nil]})
df.with_columns(Polars.col("hex").str.to_integer(base: 16, strict: true).alias("parsed"))
# =>
# shape: (4, 2)
# ┌──────┬────────┐
# │ hex  ┆ parsed │
# │ ---  ┆ ---    │
# │ str  ┆ i64    │
# ╞══════╪════════╡
# │ fa1e ┆ 64030  │
# │ ff00 ┆ 65280  │
# │ cafe ┆ 51966  │
# │ null ┆ null   │
# └──────┴────────┘

Parameters:

  • base (Integer) (defaults to: 10)

    Positive integer which is the base of the string we are parsing. Default: 10.

  • strict (Boolean) (defaults to: true)

    Bool, default=true will raise any ParseError or overflow as ComputeError. false silently convert to Null.

Returns:



1602
1603
1604
1605
# File 'lib/polars/string_expr.rb', line 1602

def to_integer(base: 10, dtype: Int64, strict: true)
  base = Utils.parse_into_expression(base, str_as_lit: false)
  Utils.wrap_expr(_rbexpr.str_to_integer(base, dtype, strict))
end

#to_lowercaseExpr

Transform to lowercase variant.

Examples:

df = Polars::DataFrame.new({"foo" => ["CAT", "DOG"]})
df.select(Polars.col("foo").str.to_lowercase)
# =>
# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ cat │
# │ dog │
# └─────┘

Returns:



472
473
474
# File 'lib/polars/string_expr.rb', line 472

def to_lowercase
  Utils.wrap_expr(_rbexpr.str_to_lowercase)
end

#to_time(format = nil, strict: true, cache: true) ⇒ Expr

Convert a Utf8 column into a Time column.

Examples:

s = Polars::Series.new(["01:00", "02:00", "03:00"])
s.str.to_time("%H:%M")
# =>
# shape: (3,)
# Series: '' [time]
# [
#         01:00:00
#         02:00:00
#         03:00:00
# ]

Parameters:

  • format (String) (defaults to: nil)

    Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%H:%M:%S". If set to nil (default), the format is inferred from the data.

  • strict (Boolean) (defaults to: true)

    Raise an error if any conversion fails.

  • cache (Boolean) (defaults to: true)

    Use a cache of unique, converted times to apply the conversion.

Returns:



137
138
139
140
# File 'lib/polars/string_expr.rb', line 137

def to_time(format = nil, strict: true, cache: true)
  _validate_format_argument(format)
  Utils.wrap_expr(_rbexpr.str_to_time(format, strict, cache))
end

#to_titlecaseExpr

Transform to titlecase variant.

Examples:

df = Polars::DataFrame.new(
  {"sing": ["welcome to my world", "THERE'S NO TURNING BACK"]}
)
df.with_columns(foo_title: Polars.col("sing").str.to_titlecase)
# =>
# shape: (2, 2)
# ┌─────────────────────────┬─────────────────────────┐
# │ sing                    ┆ foo_title               │
# │ ---                     ┆ ---                     │
# │ str                     ┆ str                     │
# ╞═════════════════════════╪═════════════════════════╡
# │ welcome to my world     ┆ Welcome To My World     │
# │ THERE'S NO TURNING BACK ┆ There's No Turning Back │
# └─────────────────────────┴─────────────────────────┘

Returns:



495
496
497
# File 'lib/polars/string_expr.rb', line 495

def to_titlecase
  Utils.wrap_expr(_rbexpr.str_to_titlecase)
end

#to_uppercaseExpr

Transform to uppercase variant.

Examples:

df = Polars::DataFrame.new({"foo" => ["cat", "dog"]})
df.select(Polars.col("foo").str.to_uppercase)
# =>
# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ CAT │
# │ DOG │
# └─────┘

Returns:



451
452
453
# File 'lib/polars/string_expr.rb', line 451

def to_uppercase
  Utils.wrap_expr(_rbexpr.str_to_uppercase)
end

#zfill(length) ⇒ Expr

Fills the string with zeroes.

Return a copy of the string left filled with ASCII '0' digits to make a string of length width.

A leading sign prefix ('+'/'-') is handled by inserting the padding after the sign character rather than before. The original string is returned if width is less than or equal to s.length.

Examples:

df = Polars::DataFrame.new({"a" => [-1, 123, 999999, nil]})
df.with_columns(Polars.col("a").cast(Polars::String).str.zfill(4).alias("zfill"))
# =>
# shape: (4, 2)
# ┌────────┬────────┐
# │ a      ┆ zfill  │
# │ ---    ┆ ---    │
# │ i64    ┆ str    │
# ╞════════╪════════╡
# │ -1     ┆ -001   │
# │ 123    ┆ 0123   │
# │ 999999 ┆ 999999 │
# │ null   ┆ null   │
# └────────┴────────┘

Parameters:

  • length (Integer)

    Fill the value up to this length

Returns:



730
731
732
733
# File 'lib/polars/string_expr.rb', line 730

def zfill(length)
  length = Utils.parse_into_expression(length)
  Utils.wrap_expr(_rbexpr.str_zfill(length))
end