Class: Polars::StringExpr

Inherits:
Object
  • Object
show all
Defined in:
lib/polars/string_expr.rb

Overview

Namespace for string related expressions.

Instance Method Summary collapse

Instance Method Details

#concat(delimiter = "-", ignore_nulls: true) ⇒ Expr

Vertically concat the values in the Series to a single string value.

Examples:

df = Polars::DataFrame.new({"foo" => [1, nil, 2]})
df.select(Polars.col("foo").str.concat("-"))
# =>
# shape: (1, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ 1-2 │
# └─────┘
df = Polars::DataFrame.new({"foo" => [1, nil, 2]})
df.select(Polars.col("foo").str.concat("-", ignore_nulls: false))
# =>
# shape: (1, 1)
# ┌──────┐
# │ foo  │
# │ ---  │
# │ str  │
# ╞══════╡
# │ null │
# └──────┘

Parameters:

  • delimiter (String) (defaults to: "-")

    The delimiter to insert between consecutive string values.

  • ignore_nulls (Boolean) (defaults to: true)

    Ignore null values (default).

Returns:



312
313
314
# File 'lib/polars/string_expr.rb', line 312

def concat(delimiter = "-", ignore_nulls: true)
  Utils.wrap_expr(_rbexpr.str_concat(delimiter, ignore_nulls))
end

#contains(pattern, literal: false, strict: true) ⇒ Expr

Check if string contains a substring that matches a regex.

Examples:

df = Polars::DataFrame.new({"a" => ["Crab", "cat and dog", "rab$bit", nil]})
df.select(
  [
    Polars.col("a"),
    Polars.col("a").str.contains("cat|bit").alias("regex"),
    Polars.col("a").str.contains("rab$", literal: true).alias("literal")
  ]
)
# =>
# shape: (4, 3)
# ┌─────────────┬───────┬─────────┐
# │ a           ┆ regex ┆ literal │
# │ ---         ┆ ---   ┆ ---     │
# │ str         ┆ bool  ┆ bool    │
# ╞═════════════╪═══════╪═════════╡
# │ Crab        ┆ false ┆ false   │
# │ cat and dog ┆ true  ┆ false   │
# │ rab$bit     ┆ true  ┆ true    │
# │ null        ┆ null  ┆ null    │
# └─────────────┴───────┴─────────┘

Parameters:

  • pattern (String)

    A valid regex pattern.

  • literal (Boolean) (defaults to: false)

    Treat pattern as a literal string.

Returns:



577
578
579
580
# File 'lib/polars/string_expr.rb', line 577

def contains(pattern, literal: false, strict: true)
  pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true)._rbexpr
  Utils.wrap_expr(_rbexpr.str_contains(pattern, literal, strict))
end

#count_matches(pattern, literal: false) ⇒ Expr Also known as: count_match

Count all successive non-overlapping regex matches.

Examples:

df = Polars::DataFrame.new({"foo" => ["123 bla 45 asd", "xyz 678 910t"]})
df.select(
  [
    Polars.col("foo").str.count_match('\d').alias("count_digits")
  ]
)
# =>
# shape: (2, 1)
# ┌──────────────┐
# │ count_digits │
# │ ---          │
# │ u32          │
# ╞══════════════╡
# │ 5            │
# │ 6            │
# └──────────────┘

Parameters:

  • pattern (String)

    A valid regex pattern

Returns:



886
887
888
889
# File 'lib/polars/string_expr.rb', line 886

def count_matches(pattern, literal: false)
  pattern = Utils.parse_as_expression(pattern, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_count_matches(pattern, literal))
end

#decode(encoding, strict: true) ⇒ Expr

Decode a value using the provided encoding.

Examples:

df = Polars::DataFrame.new({"encoded" => ["666f6f", "626172", nil]})
df.select(Polars.col("encoded").str.decode("hex"))
# =>
# shape: (3, 1)
# ┌───────────────┐
# │ encoded       │
# │ ---           │
# │ binary        │
# ╞═══════════════╡
# │ [binary data] │
# │ [binary data] │
# │ null          │
# └───────────────┘

Parameters:

  • encoding ("hex", "base64")

    The encoding to use.

  • strict (Boolean) (defaults to: true)

    How to handle invalid inputs:

    • true: An error will be thrown if unable to decode a value.
    • false: Unhandled values will be replaced with nil.

Returns:



757
758
759
760
761
762
763
764
765
# File 'lib/polars/string_expr.rb', line 757

def decode(encoding, strict: true)
  if encoding == "hex"
    Utils.wrap_expr(_rbexpr.str_hex_decode(strict))
  elsif encoding == "base64"
    Utils.wrap_expr(_rbexpr.str_base64_decode(strict))
  else
    raise ArgumentError, "encoding must be one of {{'hex', 'base64'}}, got #{encoding}"
  end
end

#encode(encoding) ⇒ Expr

Encode a value using the provided encoding.

Examples:

df = Polars::DataFrame.new({"strings" => ["foo", "bar", nil]})
df.select(Polars.col("strings").str.encode("hex"))
# =>
# shape: (3, 1)
# ┌─────────┐
# │ strings │
# │ ---     │
# │ str     │
# ╞═════════╡
# │ 666f6f  │
# │ 626172  │
# │ null    │
# └─────────┘

Parameters:

  • encoding ("hex", "base64")

    The encoding to use.

Returns:



788
789
790
791
792
793
794
795
796
# File 'lib/polars/string_expr.rb', line 788

def encode(encoding)
  if encoding == "hex"
    Utils.wrap_expr(_rbexpr.str_hex_encode)
  elsif encoding == "base64"
    Utils.wrap_expr(_rbexpr.str_base64_encode)
  else
    raise ArgumentError, "encoding must be one of {{'hex', 'base64'}}, got #{encoding}"
  end
end

#ends_with(sub) ⇒ Expr

Check if string values end with a substring.

Examples:

df = Polars::DataFrame.new({"fruits" => ["apple", "mango", nil]})
df.with_column(
  Polars.col("fruits").str.ends_with("go").alias("has_suffix")
)
# =>
# shape: (3, 2)
# ┌────────┬────────────┐
# │ fruits ┆ has_suffix │
# │ ---    ┆ ---        │
# │ str    ┆ bool       │
# ╞════════╪════════════╡
# │ apple  ┆ false      │
# │ mango  ┆ true       │
# │ null   ┆ null       │
# └────────┴────────────┘

Using ends_with as a filter condition:

df.filter(Polars.col("fruits").str.ends_with("go"))
# =>
# shape: (1, 1)
# ┌────────┐
# │ fruits │
# │ ---    │
# │ str    │
# ╞════════╡
# │ mango  │
# └────────┘

Parameters:

  • sub (String)

    Suffix substring.

Returns:



617
618
619
620
# File 'lib/polars/string_expr.rb', line 617

def ends_with(sub)
  sub = Utils.expr_to_lit_or_expr(sub, str_to_lit: true)._rbexpr
  Utils.wrap_expr(_rbexpr.str_ends_with(sub))
end

#explodeExpr

Returns a column with a separate row for every string character.

Examples:

df = Polars::DataFrame.new({"a": ["foo", "bar"]})
df.select(Polars.col("a").str.explode)
# =>
# shape: (6, 1)
# ┌─────┐
# │ a   │
# │ --- │
# │ str │
# ╞═════╡
# │ f   │
# │ o   │
# │ o   │
# │ b   │
# │ a   │
# │ r   │
# └─────┘

Returns:



1114
1115
1116
# File 'lib/polars/string_expr.rb', line 1114

def explode
  Utils.wrap_expr(_rbexpr.str_explode)
end

#extract(pattern, group_index: 1) ⇒ Expr

Extract the target capture group from provided patterns.

Examples:

df = Polars::DataFrame.new({"foo" => ["123 bla 45 asd", "xyz 678 910t"]})
df.select(
  [
    Polars.col("foo").str.extract('(\d+)')
  ]
)
# =>
# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ 123 │
# │ 678 │
# └─────┘

Parameters:

  • pattern (String)

    A valid regex pattern

  • group_index (Integer) (defaults to: 1)

    Index of the targeted capture group. Group 0 mean the whole pattern, first group begin at index 1 Default to the first capture group

Returns:



826
827
828
# File 'lib/polars/string_expr.rb', line 826

def extract(pattern, group_index: 1)
  Utils.wrap_expr(_rbexpr.str_extract(pattern, group_index))
end

#extract_all(pattern) ⇒ Expr

Extracts all matches for the given regex pattern.

Extracts each successive non-overlapping regex match in an individual string as an array.

Examples:

df = Polars::DataFrame.new({"foo" => ["123 bla 45 asd", "xyz 678 910t"]})
df.select(
  [
    Polars.col("foo").str.extract_all('(\d+)').alias("extracted_nrs")
  ]
)
# =>
# shape: (2, 1)
# ┌────────────────┐
# │ extracted_nrs  │
# │ ---            │
# │ list[str]      │
# ╞════════════════╡
# │ ["123", "45"]  │
# │ ["678", "910"] │
# └────────────────┘

Parameters:

  • pattern (String)

    A valid regex pattern

Returns:



857
858
859
860
# File 'lib/polars/string_expr.rb', line 857

def extract_all(pattern)
  pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true)
  Utils.wrap_expr(_rbexpr.str_extract_all(pattern._rbexpr))
end

#json_extract(dtype = nil, infer_schema_length: 100) ⇒ Expr

Parse string values as JSON.

Throw errors if encounter invalid JSON strings.

Examples:

df = Polars::DataFrame.new(
  {"json" => ['{"a":1, "b": true}', nil, '{"a":2, "b": false}']}
)
dtype = Polars::Struct.new([Polars::Field.new("a", Polars::Int64), Polars::Field.new("b", Polars::Boolean)])
df.select(Polars.col("json").str.json_extract(dtype))
# =>
# shape: (3, 1)
# ┌─────────────┐
# │ json        │
# │ ---         │
# │ struct[2]   │
# ╞═════════════╡
# │ {1,true}    │
# │ {null,null} │
# │ {2,false}   │
# └─────────────┘

Parameters:

  • dtype (Object) (defaults to: nil)

    The dtype to cast the extracted value to. If nil, the dtype will be inferred from the JSON value.

Returns:



689
690
691
692
693
694
# File 'lib/polars/string_expr.rb', line 689

def json_extract(dtype = nil, infer_schema_length: 100)
  if !dtype.nil?
    dtype = Utils.rb_type_to_dtype(dtype)
  end
  Utils.wrap_expr(_rbexpr.str_json_extract(dtype, infer_schema_length))
end

#json_path_match(json_path) ⇒ Expr

Extract the first match of json string with provided JSONPath expression.

Throw errors if encounter invalid json strings. All return value will be casted to Utf8 regardless of the original value.

Documentation on JSONPath standard can be found here.

Examples:

df = Polars::DataFrame.new(
  {"json_val" => ['{"a":"1"}', nil, '{"a":2}', '{"a":2.1}', '{"a":true}']}
)
df.select(Polars.col("json_val").str.json_path_match("$.a"))
# =>
# shape: (5, 1)
# ┌──────────┐
# │ json_val │
# │ ---      │
# │ str      │
# ╞══════════╡
# │ 1        │
# │ null     │
# │ 2        │
# │ 2.1      │
# │ true     │
# └──────────┘

Parameters:

  • json_path (String)

    A valid JSON path query string.

Returns:



727
728
729
# File 'lib/polars/string_expr.rb', line 727

def json_path_match(json_path)
  Utils.wrap_expr(_rbexpr.str_json_path_match(json_path))
end

#lengthsExpr

Note:

The returned lengths are equal to the number of bytes in the UTF8 string. If you need the length in terms of the number of characters, use n_chars instead.

Get length of the strings as :u32 (as number of bytes).

Examples:

df = Polars::DataFrame.new({"s" => ["Café", nil, "345", "東京"]}).with_columns(
  [
    Polars.col("s").str.lengths.alias("length"),
    Polars.col("s").str.n_chars.alias("nchars")
  ]
)
df
# =>
# shape: (4, 3)
# ┌──────┬────────┬────────┐
# │ s    ┆ length ┆ nchars │
# │ ---  ┆ ---    ┆ ---    │
# │ str  ┆ u32    ┆ u32    │
# ╞══════╪════════╪════════╡
# │ Café ┆ 5      ┆ 4      │
# │ null ┆ null   ┆ null   │
# │ 345  ┆ 3      ┆ 3      │
# │ 東京 ┆ 6      ┆ 2      │
# └──────┴────────┴────────┘

Returns:



242
243
244
# File 'lib/polars/string_expr.rb', line 242

def lengths
  Utils.wrap_expr(_rbexpr.str_len_bytes)
end

#ljust(length, fillchar = " ") ⇒ Expr Also known as: pad_end

Return the string left justified in a string of length length.

Padding is done using the specified fillchar. The original string is returned if length is less than or equal to s.length.

Examples:

df = Polars::DataFrame.new({"a" => ["cow", "monkey", nil, "hippopotamus"]})
df.select(Polars.col("a").str.ljust(8, "*"))
# =>
# shape: (4, 1)
# ┌──────────────┐
# │ a            │
# │ ---          │
# │ str          │
# ╞══════════════╡
# │ cow*****     │
# │ monkey**     │
# │ null         │
# │ hippopotamus │
# └──────────────┘

Parameters:

  • length (Integer)

    Justify left to this length.

  • fillchar (String) (defaults to: " ")

    Fill with this ASCII character.

Returns:



509
510
511
# File 'lib/polars/string_expr.rb', line 509

def ljust(length, fillchar = " ")
  Utils.wrap_expr(_rbexpr.str_pad_end(length, fillchar))
end

#n_charsExpr

Note:

If you know that you are working with ASCII text, lengths will be equivalent, and faster (returns length in terms of the number of bytes).

Get length of the strings as :u32 (as number of chars).

Examples:

df = Polars::DataFrame.new({"s" => ["Café", nil, "345", "東京"]}).with_columns(
  [
    Polars.col("s").str.lengths.alias("length"),
    Polars.col("s").str.n_chars.alias("nchars")
  ]
)
df
# =>
# shape: (4, 3)
# ┌──────┬────────┬────────┐
# │ s    ┆ length ┆ nchars │
# │ ---  ┆ ---    ┆ ---    │
# │ str  ┆ u32    ┆ u32    │
# ╞══════╪════════╪════════╡
# │ Café ┆ 5      ┆ 4      │
# │ null ┆ null   ┆ null   │
# │ 345  ┆ 3      ┆ 3      │
# │ 東京 ┆ 6      ┆ 2      │
# └──────┴────────┴────────┘

Returns:



274
275
276
# File 'lib/polars/string_expr.rb', line 274

def n_chars
  Utils.wrap_expr(_rbexpr.str_len_chars)
end

#parse_int(radix = 2, strict: true) ⇒ Expr

Parse integers with base radix from strings.

By default base 2. ParseError/Overflows become Nulls.

Examples:

df = Polars::DataFrame.new({"bin" => ["110", "101", "010", "invalid"]})
df.select(Polars.col("bin").str.parse_int(2, strict: false))
# =>
# shape: (4, 1)
# ┌──────┐
# │ bin  │
# │ ---  │
# │ i32  │
# ╞══════╡
# │ 6    │
# │ 5    │
# │ 2    │
# │ null │
# └──────┘

Parameters:

  • radix (Integer) (defaults to: 2)

    Positive integer which is the base of the string we are parsing. Default: 2.

  • strict (Boolean) (defaults to: true)

    Bool, Default=true will raise any ParseError or overflow as ComputeError. False silently convert to Null.

Returns:



1192
1193
1194
# File 'lib/polars/string_expr.rb', line 1192

def parse_int(radix = 2, strict: true)
  to_integer(base: 2, strict: strict).cast(Int32, strict: strict)
end

#replace(pattern, value, literal: false, n: 1) ⇒ Expr

Replace first matching regex/literal substring with a new string value.

Examples:

df = Polars::DataFrame.new({"id" => [1, 2], "text" => ["123abc", "abc456"]})
df.with_column(
  Polars.col("text").str.replace('abc\b', "ABC")
)
# =>
# shape: (2, 2)
# ┌─────┬────────┐
# │ id  ┆ text   │
# │ --- ┆ ---    │
# │ i64 ┆ str    │
# ╞═════╪════════╡
# │ 1   ┆ 123ABC │
# │ 2   ┆ abc456 │
# └─────┴────────┘

Parameters:

  • pattern (String)

    Regex pattern.

  • value (String)

    Replacement string.

  • literal (Boolean) (defaults to: false)

    Treat pattern as a literal string.

Returns:



1026
1027
1028
1029
1030
# File 'lib/polars/string_expr.rb', line 1026

def replace(pattern, value, literal: false, n: 1)
  pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true)
  value = Utils.expr_to_lit_or_expr(value, str_to_lit: true)
  Utils.wrap_expr(_rbexpr.str_replace_n(pattern._rbexpr, value._rbexpr, literal, n))
end

#replace_all(pattern, value, literal: false) ⇒ Expr

Replace all matching regex/literal substrings with a new string value.

Examples:

df = Polars::DataFrame.new({"id" => [1, 2], "text" => ["abcabc", "123a123"]})
df.with_column(Polars.col("text").str.replace_all("a", "-"))
# =>
# shape: (2, 2)
# ┌─────┬─────────┐
# │ id  ┆ text    │
# │ --- ┆ ---     │
# │ i64 ┆ str     │
# ╞═════╪═════════╡
# │ 1   ┆ -bc-bc  │
# │ 2   ┆ 123-123 │
# └─────┴─────────┘

Parameters:

  • pattern (String)

    Regex pattern.

  • value (String)

    Replacement string.

  • literal (Boolean) (defaults to: false)

    Treat pattern as a literal string.

Returns:



1056
1057
1058
1059
1060
# File 'lib/polars/string_expr.rb', line 1056

def replace_all(pattern, value, literal: false)
  pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true)
  value = Utils.expr_to_lit_or_expr(value, str_to_lit: true)
  Utils.wrap_expr(_rbexpr.str_replace_all(pattern._rbexpr, value._rbexpr, literal))
end

#rjust(length, fillchar = " ") ⇒ Expr Also known as: pad_start

Return the string right justified in a string of length length.

Padding is done using the specified fillchar. The original string is returned if length is less than or equal to s.length.

Examples:

df = Polars::DataFrame.new({"a" => ["cow", "monkey", nil, "hippopotamus"]})
df.select(Polars.col("a").str.rjust(8, "*"))
# =>
# shape: (4, 1)
# ┌──────────────┐
# │ a            │
# │ ---          │
# │ str          │
# ╞══════════════╡
# │ *****cow     │
# │ **monkey     │
# │ null         │
# │ hippopotamus │
# └──────────────┘

Parameters:

  • length (Integer)

    Justify right to this length.

  • fillchar (String) (defaults to: " ")

    Fill with this ASCII character.

Returns:



542
543
544
# File 'lib/polars/string_expr.rb', line 542

def rjust(length, fillchar = " ")
  Utils.wrap_expr(_rbexpr.str_pad_start(length, fillchar))
end

#slice(offset, length = nil) ⇒ Expr

Create subslices of the string values of a Utf8 Series.

Examples:

df = Polars::DataFrame.new({"s" => ["pear", nil, "papaya", "dragonfruit"]})
df.with_column(
  Polars.col("s").str.slice(-3).alias("s_sliced")
)
# =>
# shape: (4, 2)
# ┌─────────────┬──────────┐
# │ s           ┆ s_sliced │
# │ ---         ┆ ---      │
# │ str         ┆ str      │
# ╞═════════════╪══════════╡
# │ pear        ┆ ear      │
# │ null        ┆ null     │
# │ papaya      ┆ aya      │
# │ dragonfruit ┆ uit      │
# └─────────────┴──────────┘

Parameters:

  • offset (Integer)

    Start index. Negative indexing is supported.

  • length (Integer) (defaults to: nil)

    Length of the slice. If set to nil (default), the slice is taken to the end of the string.

Returns:



1089
1090
1091
# File 'lib/polars/string_expr.rb', line 1089

def slice(offset, length = nil)
  Utils.wrap_expr(_rbexpr.str_slice(offset, length))
end

#split(by, inclusive: false) ⇒ Expr

Split the string by a substring.

Examples:

df = Polars::DataFrame.new({"s" => ["foo bar", "foo-bar", "foo bar baz"]})
df.select(Polars.col("s").str.split(" "))
# =>
# shape: (3, 1)
# ┌───────────────────────┐
# │ s                     │
# │ ---                   │
# │ list[str]             │
# ╞═══════════════════════╡
# │ ["foo", "bar"]        │
# │ ["foo-bar"]           │
# │ ["foo", "bar", "baz"] │
# └───────────────────────┘

Parameters:

  • by (String)

    Substring to split by.

  • inclusive (Boolean) (defaults to: false)

    If true, include the split character/string in the results.

Returns:



915
916
917
918
919
920
921
922
# File 'lib/polars/string_expr.rb', line 915

def split(by, inclusive: false)
  by = Utils.parse_as_expression(by, str_as_lit: true)
  if inclusive
    Utils.wrap_expr(_rbexpr.str_split_inclusive(by))
  else
    Utils.wrap_expr(_rbexpr.str_split(by))
  end
end

#split_exact(by, n, inclusive: false) ⇒ Expr

Split the string by a substring using n splits.

Results in a struct of n+1 fields.

If it cannot make n splits, the remaining field elements will be null.

Examples:

df = Polars::DataFrame.new({"x" => ["a_1", nil, "c", "d_4"]})
df.select(
  [
    Polars.col("x").str.split_exact("_", 1).alias("fields")
  ]
)
# =>
# shape: (4, 1)
# ┌─────────────┐
# │ fields      │
# │ ---         │
# │ struct[2]   │
# ╞═════════════╡
# │ {"a","1"}   │
# │ {null,null} │
# │ {"c",null}  │
# │ {"d","4"}   │
# └─────────────┘

Parameters:

  • by (String)

    Substring to split by.

  • n (Integer)

    Number of splits to make.

  • inclusive (Boolean) (defaults to: false)

    If true, include the split character/string in the results.

Returns:



958
959
960
961
962
963
964
965
# File 'lib/polars/string_expr.rb', line 958

def split_exact(by, n, inclusive: false)
  by = Utils.parse_as_expression(by, str_as_lit: true)
  if inclusive
    Utils.wrap_expr(_rbexpr.str_split_exact_inclusive(by, n))
  else
    Utils.wrap_expr(_rbexpr.str_split_exact(by, n))
  end
end

#splitn(by, n) ⇒ Expr

Split the string by a substring, restricted to returning at most n items.

If the number of possible splits is less than n-1, the remaining field elements will be null. If the number of possible splits is n-1 or greater, the last (nth) substring will contain the remainder of the string.

Examples:

df = Polars::DataFrame.new({"s" => ["foo bar", nil, "foo-bar", "foo bar baz"]})
df.select(Polars.col("s").str.splitn(" ", 2).alias("fields"))
# =>
# shape: (4, 1)
# ┌───────────────────┐
# │ fields            │
# │ ---               │
# │ struct[2]         │
# ╞═══════════════════╡
# │ {"foo","bar"}     │
# │ {null,null}       │
# │ {"foo-bar",null}  │
# │ {"foo","bar baz"} │
# └───────────────────┘

Parameters:

  • by (String)

    Substring to split by.

  • n (Integer)

    Max number of items to return.

Returns:



995
996
997
998
# File 'lib/polars/string_expr.rb', line 995

def splitn(by, n)
  by = Utils.parse_as_expression(by, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_splitn(by, n))
end

#starts_with(sub) ⇒ Expr

Check if string values start with a substring.

Examples:

df = Polars::DataFrame.new({"fruits" => ["apple", "mango", nil]})
df.with_column(
  Polars.col("fruits").str.starts_with("app").alias("has_prefix")
)
# =>
# shape: (3, 2)
# ┌────────┬────────────┐
# │ fruits ┆ has_prefix │
# │ ---    ┆ ---        │
# │ str    ┆ bool       │
# ╞════════╪════════════╡
# │ apple  ┆ true       │
# │ mango  ┆ false      │
# │ null   ┆ null       │
# └────────┴────────────┘

Using starts_with as a filter condition:

df.filter(Polars.col("fruits").str.starts_with("app"))
# =>
# shape: (1, 1)
# ┌────────┐
# │ fruits │
# │ ---    │
# │ str    │
# ╞════════╡
# │ apple  │
# └────────┘

Parameters:

  • sub (String)

    Prefix substring.

Returns:



657
658
659
660
# File 'lib/polars/string_expr.rb', line 657

def starts_with(sub)
  sub = Utils.expr_to_lit_or_expr(sub, str_to_lit: true)._rbexpr
  Utils.wrap_expr(_rbexpr.str_starts_with(sub))
end

#strip_chars(characters = nil) ⇒ Expr Also known as: strip

Remove leading and trailing whitespace.

Examples:

df = Polars::DataFrame.new({"foo" => [" lead", "trail ", " both "]})
df.select(Polars.col("foo").str.strip)
# =>
# shape: (3, 1)
# ┌───────┐
# │ foo   │
# │ ---   │
# │ str   │
# ╞═══════╡
# │ lead  │
# │ trail │
# │ both  │
# └───────┘

Parameters:

  • characters (String, nil) (defaults to: nil)

    An optional single character that should be trimmed.

Returns:



379
380
381
382
# File 'lib/polars/string_expr.rb', line 379

def strip_chars(characters = nil)
  characters = Utils.parse_as_expression(characters, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_strip_chars(characters))
end

#strip_chars_end(characters = nil) ⇒ Expr Also known as: rstrip

Remove trailing whitespace.

Examples:

df = Polars::DataFrame.new({"foo" => [" lead", "trail ", " both "]})
df.select(Polars.col("foo").str.rstrip)
# =>
# shape: (3, 1)
# ┌───────┐
# │ foo   │
# │ ---   │
# │ str   │
# ╞═══════╡
# │  lead │
# │ trail │
# │  both │
# └───────┘

Parameters:

  • characters (String, nil) (defaults to: nil)

    An optional single character that should be trimmed.

Returns:



433
434
435
436
# File 'lib/polars/string_expr.rb', line 433

def strip_chars_end(characters = nil)
  characters = Utils.parse_as_expression(characters, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_strip_chars_end(characters))
end

#strip_chars_start(characters = nil) ⇒ Expr Also known as: lstrip

Remove leading whitespace.

Examples:

df = Polars::DataFrame.new({"foo" => [" lead", "trail ", " both "]})
df.select(Polars.col("foo").str.lstrip)
# =>
# shape: (3, 1)
# ┌────────┐
# │ foo    │
# │ ---    │
# │ str    │
# ╞════════╡
# │ lead   │
# │ trail  │
# │ both   │
# └────────┘

Parameters:

  • characters (String, nil) (defaults to: nil)

    An optional single character that should be trimmed.

Returns:



406
407
408
409
# File 'lib/polars/string_expr.rb', line 406

def strip_chars_start(characters = nil)
  characters = Utils.parse_as_expression(characters, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_strip_chars_start(characters))
end

#strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false) ⇒ Expr

Note:

When parsing a Datetime the column precision will be inferred from the format string, if given, eg: "%F %T%.3f" => Datetime("ms"). If no fractional second component is found then the default is "us".

Parse a Utf8 expression to a Date/Datetime/Time type.

Examples:

Dealing with a consistent format:

s = Polars::Series.new(["2020-01-01 01:00Z", "2020-01-01 02:00Z"])
s.str.strptime(Polars::Datetime, "%Y-%m-%d %H:%M%#z")
# =>
# shape: (2,)
# Series: '' [datetime[μs, UTC]]
# [
#         2020-01-01 01:00:00 UTC
#         2020-01-01 02:00:00 UTC
# ]

Dealing with different formats.

s = Polars::Series.new(
  "date",
  [
    "2021-04-22",
    "2022-01-04 00:00:00",
    "01/31/22",
    "Sun Jul  8 00:34:60 2001",
  ]
)
s.to_frame.select(
  Polars.coalesce(
    Polars.col("date").str.strptime(Polars::Date, "%F", strict: false),
    Polars.col("date").str.strptime(Polars::Date, "%F %T", strict: false),
    Polars.col("date").str.strptime(Polars::Date, "%D", strict: false),
    Polars.col("date").str.strptime(Polars::Date, "%c", strict: false)
  )
).to_series
# =>
# shape: (4,)
# Series: 'date' [date]
# [
#         2021-04-22
#         2022-01-04
#         2022-01-31
#         2001-07-08
# ]

Parameters:

  • dtype (Object)

    The data type to convert into. Can be either Date, Datetime, or Time.

  • format (String) (defaults to: nil)

    Format to use, refer to the chrono strftime documentation for specification. Example: "%y-%m-%d".

  • strict (Boolean) (defaults to: true)

    Raise an error if any conversion fails.

  • exact (Boolean) (defaults to: true)
    • If true, require an exact format match.
    • If false, allow the format to match anywhere in the target string.
  • utc (Boolean) (defaults to: false)

    Parse timezone aware datetimes as UTC. This may be useful if you have data with mixed offsets.

Returns:



197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
# File 'lib/polars/string_expr.rb', line 197

def strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false)
  _validate_format_argument(format)

  if dtype == Date
    to_date(format, strict: strict, exact: exact, cache: cache)
  elsif dtype == Datetime || dtype.is_a?(Datetime)
    dtype = Datetime.new if dtype == Datetime
    time_unit = dtype.time_unit
    time_zone = dtype.time_zone
    to_datetime(format, time_unit: time_unit, time_zone: time_zone, strict: strict, exact: exact, cache: cache)
  elsif dtype == Time
    to_time(format, strict: strict, cache: cache)
  else
    raise ArgumentError, "dtype should be of type {Date, Datetime, Time}"
  end
end

#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ Expr

Convert a Utf8 column into a Date column.

Examples:

s = Polars::Series.new(["2020/01/01", "2020/02/01", "2020/03/01"])
s.str.to_date
# =>
# shape: (3,)
# Series: '' [date]
# [
#         2020-01-01
#         2020-02-01
#         2020-03-01
# ]

Parameters:

  • format (String) (defaults to: nil)

    Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%Y-%m-%d". If set to nil (default), the format is inferred from the data.

  • strict (Boolean) (defaults to: true)

    Raise an error if any conversion fails.

  • exact (Boolean) (defaults to: true)

    Require an exact format match. If false, allow the format to match anywhere in the target string.

  • cache (Boolean) (defaults to: true)

    Use a cache of unique, converted dates to apply the conversion.

Returns:



40
41
42
43
# File 'lib/polars/string_expr.rb', line 40

def to_date(format = nil, strict: true, exact: true, cache: true)
  _validate_format_argument(format)
  Utils.wrap_expr(self._rbexpr.str_to_date(format, strict, exact, cache))
end

#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, use_earliest: nil, ambiguous: "raise") ⇒ Expr

Convert a Utf8 column into a Datetime column.

Examples:

s = Polars::Series.new(["2020-01-01 01:00Z", "2020-01-01 02:00Z"])
s.str.to_datetime("%Y-%m-%d %H:%M%#z")
# =>
# shape: (2,)
# Series: '' [datetime[μs, UTC]]
# [
#         2020-01-01 01:00:00 UTC
#         2020-01-01 02:00:00 UTC
# ]

Parameters:

  • format (String) (defaults to: nil)

    Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%Y-%m-%d %H:%M:%S". If set to nil (default), the format is inferred from the data.

  • time_unit ("us", "ns", "ms") (defaults to: nil)

    Unit of time for the resulting Datetime column. If set to nil (default), the time unit is inferred from the format string if given, eg: "%F %T%.3f" => Datetime("ms"). If no fractional second component is found, the default is "us".

  • time_zone (String) (defaults to: nil)

    Time zone for the resulting Datetime column.

  • strict (Boolean) (defaults to: true)

    Raise an error if any conversion fails.

  • exact (Boolean) (defaults to: true)

    Require an exact format match. If false, allow the format to match anywhere in the target string.

  • cache (Boolean) (defaults to: true)

    Use a cache of unique, converted datetimes to apply the conversion.

Returns:



79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
# File 'lib/polars/string_expr.rb', line 79

def to_datetime(
  format = nil,
  time_unit: nil,
  time_zone: nil,
  strict: true,
  exact: true,
  cache: true,
  use_earliest: nil,
  ambiguous: "raise"
)
  _validate_format_argument(format)
  ambiguous = Utils.rename_use_earliest_to_ambiguous(use_earliest, ambiguous)
  ambiguous = Polars.lit(ambiguous) unless ambiguous.is_a?(Expr)
  Utils.wrap_expr(
    self._rbexpr.str_to_datetime(
      format,
      time_unit,
      time_zone,
      strict,
      exact,
      cache,
      ambiguous._rbexpr
    )
  )
end

#to_integer(base: 10, strict: true) ⇒ Expr

Convert an Utf8 column into an Int64 column with base radix.

Examples:

df = Polars::DataFrame.new({"bin" => ["110", "101", "010", "invalid"]})
df.with_columns(Polars.col("bin").str.to_integer(base: 2, strict: false).alias("parsed"))
# =>
# shape: (4, 2)
# ┌─────────┬────────┐
# │ bin     ┆ parsed │
# │ ---     ┆ ---    │
# │ str     ┆ i64    │
# ╞═════════╪════════╡
# │ 110     ┆ 6      │
# │ 101     ┆ 5      │
# │ 010     ┆ 2      │
# │ invalid ┆ null   │
# └─────────┴────────┘
df = Polars::DataFrame.new({"hex" => ["fa1e", "ff00", "cafe", nil]})
df.with_columns(Polars.col("hex").str.to_integer(base: 16, strict: true).alias("parsed"))
# =>
# shape: (4, 2)
# ┌──────┬────────┐
# │ hex  ┆ parsed │
# │ ---  ┆ ---    │
# │ str  ┆ i64    │
# ╞══════╪════════╡
# │ fa1e ┆ 64030  │
# │ ff00 ┆ 65280  │
# │ cafe ┆ 51966  │
# │ null ┆ null   │
# └──────┴────────┘

Parameters:

  • base (Integer) (defaults to: 10)

    Positive integer which is the base of the string we are parsing. Default: 10.

  • strict (Boolean) (defaults to: true)

    Bool, default=true will raise any ParseError or overflow as ComputeError. false silently convert to Null.

Returns:



1160
1161
1162
# File 'lib/polars/string_expr.rb', line 1160

def to_integer(base: 10, strict: true)
  Utils.wrap_expr(_rbexpr.str_to_integer(base, strict))
end

#to_lowercaseExpr

Transform to lowercase variant.

Examples:

df = Polars::DataFrame.new({"foo" => ["CAT", "DOG"]})
df.select(Polars.col("foo").str.to_lowercase)
# =>
# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ cat │
# │ dog │
# └─────┘

Returns:



354
355
356
# File 'lib/polars/string_expr.rb', line 354

def to_lowercase
  Utils.wrap_expr(_rbexpr.str_to_lowercase)
end

#to_time(format = nil, strict: true, cache: true) ⇒ Expr

Convert a Utf8 column into a Time column.

Examples:

s = Polars::Series.new(["01:00", "02:00", "03:00"])
s.str.to_time("%H:%M")
# =>
# shape: (3,)
# Series: '' [time]
# [
#         01:00:00
#         02:00:00
#         03:00:00
# ]

Parameters:

  • format (String) (defaults to: nil)

    Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%H:%M:%S". If set to nil (default), the format is inferred from the data.

  • strict (Boolean) (defaults to: true)

    Raise an error if any conversion fails.

  • cache (Boolean) (defaults to: true)

    Use a cache of unique, converted times to apply the conversion.

Returns:



130
131
132
133
# File 'lib/polars/string_expr.rb', line 130

def to_time(format = nil, strict: true, cache: true)
  _validate_format_argument(format)
  Utils.wrap_expr(_rbexpr.str_to_time(format, strict, cache))
end

#to_uppercaseExpr

Transform to uppercase variant.

Examples:

df = Polars::DataFrame.new({"foo" => ["cat", "dog"]})
df.select(Polars.col("foo").str.to_uppercase)
# =>
# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ CAT │
# │ DOG │
# └─────┘

Returns:



333
334
335
# File 'lib/polars/string_expr.rb', line 333

def to_uppercase
  Utils.wrap_expr(_rbexpr.str_to_uppercase)
end

#zfill(alignment) ⇒ Expr

Fills the string with zeroes.

Return a copy of the string left filled with ASCII '0' digits to make a string of length width.

A leading sign prefix ('+'/'-') is handled by inserting the padding after the sign character rather than before. The original string is returned if width is less than or equal to s.length.

Examples:

df = Polars::DataFrame.new(
  {
    "num" => [-10, -1, 0, 1, 10, 100, 1000, 10000, 100000, 1000000, nil]
  }
)
df.with_column(Polars.col("num").cast(String).str.zfill(5))
# =>
# shape: (11, 1)
# ┌─────────┐
# │ num     │
# │ ---     │
# │ str     │
# ╞═════════╡
# │ -0010   │
# │ -0001   │
# │ 00000   │
# │ 00001   │
# │ …       │
# │ 10000   │
# │ 100000  │
# │ 1000000 │
# │ null    │
# └─────────┘

Parameters:

  • alignment (Integer)

    Fill the value up to this length

Returns:



477
478
479
# File 'lib/polars/string_expr.rb', line 477

def zfill(alignment)
  Utils.wrap_expr(_rbexpr.str_zfill(alignment))
end