Class: Polars::StringExpr

Inherits:
Object
  • Object
show all
Defined in:
lib/polars/string_expr.rb

Overview

Namespace for string related expressions.

Instance Method Summary collapse

Instance Method Details

#concat(delimiter = "-") ⇒ Expr

Vertically concat the values in the Series to a single string value.

Examples:

df = Polars::DataFrame.new({"foo" => [1, nil, 2]})
df.select(Polars.col("foo").str.concat("-"))
# =>
# shape: (1, 1)
# ┌──────────┐
# │ foo      │
# │ ---      │
# │ str      │
# ╞══════════╡
# │ 1-null-2 │
# └──────────┘

Parameters:

  • delimiter (String) (defaults to: "-")

    The delimiter to insert between consecutive string values.

Returns:



292
293
294
# File 'lib/polars/string_expr.rb', line 292

def concat(delimiter = "-")
  Utils.wrap_expr(_rbexpr.str_concat(delimiter))
end

#contains(pattern, literal: false, strict: true) ⇒ Expr

Check if string contains a substring that matches a regex.

Examples:

df = Polars::DataFrame.new({"a" => ["Crab", "cat and dog", "rab$bit", nil]})
df.select(
  [
    Polars.col("a"),
    Polars.col("a").str.contains("cat|bit").alias("regex"),
    Polars.col("a").str.contains("rab$", literal: true).alias("literal")
  ]
)
# =>
# shape: (4, 3)
# ┌─────────────┬───────┬─────────┐
# │ a           ┆ regex ┆ literal │
# │ ---         ┆ ---   ┆ ---     │
# │ str         ┆ bool  ┆ bool    │
# ╞═════════════╪═══════╪═════════╡
# │ Crab        ┆ false ┆ false   │
# │ cat and dog ┆ true  ┆ false   │
# │ rab$bit     ┆ true  ┆ true    │
# │ null        ┆ null  ┆ null    │
# └─────────────┴───────┴─────────┘

Parameters:

  • pattern (String)

    A valid regex pattern.

  • literal (Boolean) (defaults to: false)

    Treat pattern as a literal string.

Returns:



558
559
560
561
# File 'lib/polars/string_expr.rb', line 558

def contains(pattern, literal: false, strict: true)
  pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true)._rbexpr
  Utils.wrap_expr(_rbexpr.str_contains(pattern, literal, strict))
end

#count_match(pattern) ⇒ Expr

Count all successive non-overlapping regex matches.

Examples:

df = Polars::DataFrame.new({"foo" => ["123 bla 45 asd", "xyz 678 910t"]})
df.select(
  [
    Polars.col("foo").str.count_match('\d').alias("count_digits")
  ]
)
# =>
# shape: (2, 1)
# ┌──────────────┐
# │ count_digits │
# │ ---          │
# │ u32          │
# ╞══════════════╡
# │ 5            │
# │ 6            │
# └──────────────┘

Parameters:

  • pattern (String)

    A valid regex pattern

Returns:



867
868
869
# File 'lib/polars/string_expr.rb', line 867

def count_match(pattern)
  Utils.wrap_expr(_rbexpr.count_match(pattern))
end

#decode(encoding, strict: true) ⇒ Expr

Decode a value using the provided encoding.

Examples:

df = Polars::DataFrame.new({"encoded" => ["666f6f", "626172", nil]})
df.select(Polars.col("encoded").str.decode("hex"))
# =>
# shape: (3, 1)
# ┌───────────────┐
# │ encoded       │
# │ ---           │
# │ binary        │
# ╞═══════════════╡
# │ [binary data] │
# │ [binary data] │
# │ null          │
# └───────────────┘

Parameters:

  • encoding ("hex", "base64")

    The encoding to use.

  • strict (Boolean) (defaults to: true)

    How to handle invalid inputs:

    • true: An error will be thrown if unable to decode a value.
    • false: Unhandled values will be replaced with nil.

Returns:



738
739
740
741
742
743
744
745
746
# File 'lib/polars/string_expr.rb', line 738

def decode(encoding, strict: true)
  if encoding == "hex"
    Utils.wrap_expr(_rbexpr.str_hex_decode(strict))
  elsif encoding == "base64"
    Utils.wrap_expr(_rbexpr.str_base64_decode(strict))
  else
    raise ArgumentError, "encoding must be one of {{'hex', 'base64'}}, got #{encoding}"
  end
end

#encode(encoding) ⇒ Expr

Encode a value using the provided encoding.

Examples:

df = Polars::DataFrame.new({"strings" => ["foo", "bar", nil]})
df.select(Polars.col("strings").str.encode("hex"))
# =>
# shape: (3, 1)
# ┌─────────┐
# │ strings │
# │ ---     │
# │ str     │
# ╞═════════╡
# │ 666f6f  │
# │ 626172  │
# │ null    │
# └─────────┘

Parameters:

  • encoding ("hex", "base64")

    The encoding to use.

Returns:



769
770
771
772
773
774
775
776
777
# File 'lib/polars/string_expr.rb', line 769

def encode(encoding)
  if encoding == "hex"
    Utils.wrap_expr(_rbexpr.str_hex_encode)
  elsif encoding == "base64"
    Utils.wrap_expr(_rbexpr.str_base64_encode)
  else
    raise ArgumentError, "encoding must be one of {{'hex', 'base64'}}, got #{encoding}"
  end
end

#ends_with(sub) ⇒ Expr

Check if string values end with a substring.

Examples:

df = Polars::DataFrame.new({"fruits" => ["apple", "mango", nil]})
df.with_column(
  Polars.col("fruits").str.ends_with("go").alias("has_suffix")
)
# =>
# shape: (3, 2)
# ┌────────┬────────────┐
# │ fruits ┆ has_suffix │
# │ ---    ┆ ---        │
# │ str    ┆ bool       │
# ╞════════╪════════════╡
# │ apple  ┆ false      │
# │ mango  ┆ true       │
# │ null   ┆ null       │
# └────────┴────────────┘

Using ends_with as a filter condition:

df.filter(Polars.col("fruits").str.ends_with("go"))
# =>
# shape: (1, 1)
# ┌────────┐
# │ fruits │
# │ ---    │
# │ str    │
# ╞════════╡
# │ mango  │
# └────────┘

Parameters:

  • sub (String)

    Suffix substring.

Returns:



598
599
600
601
# File 'lib/polars/string_expr.rb', line 598

def ends_with(sub)
  sub = Utils.expr_to_lit_or_expr(sub, str_to_lit: true)._rbexpr
  Utils.wrap_expr(_rbexpr.str_ends_with(sub))
end

#explodeExpr

Returns a column with a separate row for every string character.

Examples:

df = Polars::DataFrame.new({"a": ["foo", "bar"]})
df.select(Polars.col("a").str.explode)
# =>
# shape: (6, 1)
# ┌─────┐
# │ a   │
# │ --- │
# │ str │
# ╞═════╡
# │ f   │
# │ o   │
# │ o   │
# │ b   │
# │ a   │
# │ r   │
# └─────┘

Returns:



1090
1091
1092
# File 'lib/polars/string_expr.rb', line 1090

def explode
  Utils.wrap_expr(_rbexpr.str_explode)
end

#extract(pattern, group_index: 1) ⇒ Expr

Extract the target capture group from provided patterns.

Examples:

df = Polars::DataFrame.new({"foo" => ["123 bla 45 asd", "xyz 678 910t"]})
df.select(
  [
    Polars.col("foo").str.extract('(\d+)')
  ]
)
# =>
# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ 123 │
# │ 678 │
# └─────┘

Parameters:

  • pattern (String)

    A valid regex pattern

  • group_index (Integer) (defaults to: 1)

    Index of the targeted capture group. Group 0 mean the whole pattern, first group begin at index 1 Default to the first capture group

Returns:



807
808
809
# File 'lib/polars/string_expr.rb', line 807

def extract(pattern, group_index: 1)
  Utils.wrap_expr(_rbexpr.str_extract(pattern, group_index))
end

#extract_all(pattern) ⇒ Expr

Extracts all matches for the given regex pattern.

Extracts each successive non-overlapping regex match in an individual string as an array.

Examples:

df = Polars::DataFrame.new({"foo" => ["123 bla 45 asd", "xyz 678 910t"]})
df.select(
  [
    Polars.col("foo").str.extract_all('(\d+)').alias("extracted_nrs")
  ]
)
# =>
# shape: (2, 1)
# ┌────────────────┐
# │ extracted_nrs  │
# │ ---            │
# │ list[str]      │
# ╞════════════════╡
# │ ["123", "45"]  │
# │ ["678", "910"] │
# └────────────────┘

Parameters:

  • pattern (String)

    A valid regex pattern

Returns:



838
839
840
841
# File 'lib/polars/string_expr.rb', line 838

def extract_all(pattern)
  pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true)
  Utils.wrap_expr(_rbexpr.str_extract_all(pattern._rbexpr))
end

#json_extract(dtype = nil, infer_schema_length: 100) ⇒ Expr

Parse string values as JSON.

Throw errors if encounter invalid JSON strings.

Examples:

df = Polars::DataFrame.new(
  {"json" => ['{"a":1, "b": true}', nil, '{"a":2, "b": false}']}
)
dtype = Polars::Struct.new([Polars::Field.new("a", Polars::Int64), Polars::Field.new("b", Polars::Boolean)])
df.select(Polars.col("json").str.json_extract(dtype))
# =>
# shape: (3, 1)
# ┌─────────────┐
# │ json        │
# │ ---         │
# │ struct[2]   │
# ╞═════════════╡
# │ {1,true}    │
# │ {null,null} │
# │ {2,false}   │
# └─────────────┘

Parameters:

  • dtype (Object) (defaults to: nil)

    The dtype to cast the extracted value to. If nil, the dtype will be inferred from the JSON value.

Returns:



670
671
672
673
674
675
# File 'lib/polars/string_expr.rb', line 670

def json_extract(dtype = nil, infer_schema_length: 100)
  if !dtype.nil?
    dtype = Utils.rb_type_to_dtype(dtype)
  end
  Utils.wrap_expr(_rbexpr.str_json_extract(dtype, infer_schema_length))
end

#json_path_match(json_path) ⇒ Expr

Extract the first match of json string with provided JSONPath expression.

Throw errors if encounter invalid json strings. All return value will be casted to Utf8 regardless of the original value.

Documentation on JSONPath standard can be found here.

Examples:

df = Polars::DataFrame.new(
  {"json_val" => ['{"a":"1"}', nil, '{"a":2}', '{"a":2.1}', '{"a":true}']}
)
df.select(Polars.col("json_val").str.json_path_match("$.a"))
# =>
# shape: (5, 1)
# ┌──────────┐
# │ json_val │
# │ ---      │
# │ str      │
# ╞══════════╡
# │ 1        │
# │ null     │
# │ 2        │
# │ 2.1      │
# │ true     │
# └──────────┘

Parameters:

  • json_path (String)

    A valid JSON path query string.

Returns:



708
709
710
# File 'lib/polars/string_expr.rb', line 708

def json_path_match(json_path)
  Utils.wrap_expr(_rbexpr.str_json_path_match(json_path))
end

#lengthsExpr

Note:

The returned lengths are equal to the number of bytes in the UTF8 string. If you need the length in terms of the number of characters, use n_chars instead.

Get length of the strings as :u32 (as number of bytes).

Examples:

df = Polars::DataFrame.new({"s" => ["Café", nil, "345", "東京"]}).with_columns(
  [
    Polars.col("s").str.lengths.alias("length"),
    Polars.col("s").str.n_chars.alias("nchars")
  ]
)
df
# =>
# shape: (4, 3)
# ┌──────┬────────┬────────┐
# │ s    ┆ length ┆ nchars │
# │ ---  ┆ ---    ┆ ---    │
# │ str  ┆ u32    ┆ u32    │
# ╞══════╪════════╪════════╡
# │ Café ┆ 5      ┆ 4      │
# │ null ┆ null   ┆ null   │
# │ 345  ┆ 3      ┆ 3      │
# │ 東京 ┆ 6      ┆ 2      │
# └──────┴────────┴────────┘

Returns:



237
238
239
# File 'lib/polars/string_expr.rb', line 237

def lengths
  Utils.wrap_expr(_rbexpr.str_lengths)
end

#ljust(width, fillchar = " ") ⇒ Expr

Return the string left justified in a string of length width.

Padding is done using the specified fillchar. The original string is returned if width is less than or equal to s.length.

Examples:

df = Polars::DataFrame.new({"a" => ["cow", "monkey", nil, "hippopotamus"]})
df.select(Polars.col("a").str.ljust(8, "*"))
# =>
# shape: (4, 1)
# ┌──────────────┐
# │ a            │
# │ ---          │
# │ str          │
# ╞══════════════╡
# │ cow*****     │
# │ monkey**     │
# │ null         │
# │ hippopotamus │
# └──────────────┘

Parameters:

  • width (Integer)

    Justify left to this length.

  • fillchar (String) (defaults to: " ")

    Fill with this ASCII character.

Returns:



492
493
494
# File 'lib/polars/string_expr.rb', line 492

def ljust(width, fillchar = " ")
  Utils.wrap_expr(_rbexpr.str_ljust(width, fillchar))
end

#lstrip(matches = nil) ⇒ Expr

Remove leading whitespace.

Examples:

df = Polars::DataFrame.new({"foo" => [" lead", "trail ", " both "]})
df.select(Polars.col("foo").str.lstrip)
# =>
# shape: (3, 1)
# ┌────────┐
# │ foo    │
# │ ---    │
# │ str    │
# ╞════════╡
# │ lead   │
# │ trail  │
# │ both   │
# └────────┘

Parameters:

  • matches (String, nil) (defaults to: nil)

    An optional single character that should be trimmed.

Returns:



387
388
389
390
391
392
# File 'lib/polars/string_expr.rb', line 387

def lstrip(matches = nil)
  if !matches.nil? && matches.length > 1
    raise ArgumentError, "matches should contain a single character"
  end
  Utils.wrap_expr(_rbexpr.str_lstrip(matches))
end

#n_charsExpr

Note:

If you know that you are working with ASCII text, lengths will be equivalent, and faster (returns length in terms of the number of bytes).

Get length of the strings as :u32 (as number of chars).

Examples:

df = Polars::DataFrame.new({"s" => ["Café", nil, "345", "東京"]}).with_columns(
  [
    Polars.col("s").str.lengths.alias("length"),
    Polars.col("s").str.n_chars.alias("nchars")
  ]
)
df
# =>
# shape: (4, 3)
# ┌──────┬────────┬────────┐
# │ s    ┆ length ┆ nchars │
# │ ---  ┆ ---    ┆ ---    │
# │ str  ┆ u32    ┆ u32    │
# ╞══════╪════════╪════════╡
# │ Café ┆ 5      ┆ 4      │
# │ null ┆ null   ┆ null   │
# │ 345  ┆ 3      ┆ 3      │
# │ 東京 ┆ 6      ┆ 2      │
# └──────┴────────┴────────┘

Returns:



269
270
271
# File 'lib/polars/string_expr.rb', line 269

def n_chars
  Utils.wrap_expr(_rbexpr.str_n_chars)
end

#parse_int(radix = 2, strict: true) ⇒ Expr

Parse integers with base radix from strings.

By default base 2. ParseError/Overflows become Nulls.

Examples:

df = Polars::DataFrame.new({"bin" => ["110", "101", "010", "invalid"]})
df.select(Polars.col("bin").str.parse_int(2, strict: false))
# =>
# shape: (4, 1)
# ┌──────┐
# │ bin  │
# │ ---  │
# │ i32  │
# ╞══════╡
# │ 6    │
# │ 5    │
# │ 2    │
# │ null │
# └──────┘
df = Polars::DataFrame.new({"hex" => ["fa1e", "ff00", "cafe", nil]})
df.select(Polars.col("hex").str.parse_int(16, strict: true))
# =>
# shape: (4, 1)
# ┌───────┐
# │ hex   │
# │ ---   │
# │ i32   │
# ╞═══════╡
# │ 64030 │
# │ 65280 │
# │ 51966 │
# │ null  │
# └───────┘

Parameters:

  • radix (Integer) (defaults to: 2)

    Positive integer which is the base of the string we are parsing. Default: 2.

  • strict (Boolean) (defaults to: true)

    Bool, Default=true will raise any ParseError or overflow as ComputeError. False silently convert to Null.

Returns:



1138
1139
1140
# File 'lib/polars/string_expr.rb', line 1138

def parse_int(radix = 2, strict: true)
  Utils.wrap_expr(_rbexpr.str_parse_int(radix, strict))
end

#replace(pattern, value, literal: false, n: 1) ⇒ Expr

Replace first matching regex/literal substring with a new string value.

Examples:

df = Polars::DataFrame.new({"id" => [1, 2], "text" => ["123abc", "abc456"]})
df.with_column(
  Polars.col("text").str.replace('abc\b', "ABC")
)
# =>
# shape: (2, 2)
# ┌─────┬────────┐
# │ id  ┆ text   │
# │ --- ┆ ---    │
# │ i64 ┆ str    │
# ╞═════╪════════╡
# │ 1   ┆ 123ABC │
# │ 2   ┆ abc456 │
# └─────┴────────┘

Parameters:

  • pattern (String)

    Regex pattern.

  • value (String)

    Replacement string.

  • literal (Boolean) (defaults to: false)

    Treat pattern as a literal string.

Returns:



1002
1003
1004
1005
1006
# File 'lib/polars/string_expr.rb', line 1002

def replace(pattern, value, literal: false, n: 1)
  pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true)
  value = Utils.expr_to_lit_or_expr(value, str_to_lit: true)
  Utils.wrap_expr(_rbexpr.str_replace_n(pattern._rbexpr, value._rbexpr, literal, n))
end

#replace_all(pattern, value, literal: false) ⇒ Expr

Replace all matching regex/literal substrings with a new string value.

Examples:

df = Polars::DataFrame.new({"id" => [1, 2], "text" => ["abcabc", "123a123"]})
df.with_column(Polars.col("text").str.replace_all("a", "-"))
# =>
# shape: (2, 2)
# ┌─────┬─────────┐
# │ id  ┆ text    │
# │ --- ┆ ---     │
# │ i64 ┆ str     │
# ╞═════╪═════════╡
# │ 1   ┆ -bc-bc  │
# │ 2   ┆ 123-123 │
# └─────┴─────────┘

Parameters:

  • pattern (String)

    Regex pattern.

  • value (String)

    Replacement string.

  • literal (Boolean) (defaults to: false)

    Treat pattern as a literal string.

Returns:



1032
1033
1034
1035
1036
# File 'lib/polars/string_expr.rb', line 1032

def replace_all(pattern, value, literal: false)
  pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true)
  value = Utils.expr_to_lit_or_expr(value, str_to_lit: true)
  Utils.wrap_expr(_rbexpr.str_replace_all(pattern._rbexpr, value._rbexpr, literal))
end

#rjust(width, fillchar = " ") ⇒ Expr

Return the string right justified in a string of length width.

Padding is done using the specified fillchar. The original string is returned if width is less than or equal to s.length.

Examples:

df = Polars::DataFrame.new({"a" => ["cow", "monkey", nil, "hippopotamus"]})
df.select(Polars.col("a").str.rjust(8, "*"))
# =>
# shape: (4, 1)
# ┌──────────────┐
# │ a            │
# │ ---          │
# │ str          │
# ╞══════════════╡
# │ *****cow     │
# │ **monkey     │
# │ null         │
# │ hippopotamus │
# └──────────────┘

Parameters:

  • width (Integer)

    Justify right to this length.

  • fillchar (String) (defaults to: " ")

    Fill with this ASCII character.

Returns:



524
525
526
# File 'lib/polars/string_expr.rb', line 524

def rjust(width, fillchar = " ")
  Utils.wrap_expr(_rbexpr.str_rjust(width, fillchar))
end

#rstrip(matches = nil) ⇒ Expr

Remove trailing whitespace.

Examples:

df = Polars::DataFrame.new({"foo" => [" lead", "trail ", " both "]})
df.select(Polars.col("foo").str.rstrip)
# =>
# shape: (3, 1)
# ┌───────┐
# │ foo   │
# │ ---   │
# │ str   │
# ╞═══════╡
# │  lead │
# │ trail │
# │  both │
# └───────┘

Parameters:

  • matches (String, nil) (defaults to: nil)

    An optional single character that should be trimmed.

Returns:



415
416
417
418
419
420
# File 'lib/polars/string_expr.rb', line 415

def rstrip(matches = nil)
  if !matches.nil? && matches.length > 1
    raise ArgumentError, "matches should contain a single character"
  end
  Utils.wrap_expr(_rbexpr.str_rstrip(matches))
end

#slice(offset, length = nil) ⇒ Expr

Create subslices of the string values of a Utf8 Series.

Examples:

df = Polars::DataFrame.new({"s" => ["pear", nil, "papaya", "dragonfruit"]})
df.with_column(
  Polars.col("s").str.slice(-3).alias("s_sliced")
)
# =>
# shape: (4, 2)
# ┌─────────────┬──────────┐
# │ s           ┆ s_sliced │
# │ ---         ┆ ---      │
# │ str         ┆ str      │
# ╞═════════════╪══════════╡
# │ pear        ┆ ear      │
# │ null        ┆ null     │
# │ papaya      ┆ aya      │
# │ dragonfruit ┆ uit      │
# └─────────────┴──────────┘

Parameters:

  • offset (Integer)

    Start index. Negative indexing is supported.

  • length (Integer) (defaults to: nil)

    Length of the slice. If set to nil (default), the slice is taken to the end of the string.

Returns:



1065
1066
1067
# File 'lib/polars/string_expr.rb', line 1065

def slice(offset, length = nil)
  Utils.wrap_expr(_rbexpr.str_slice(offset, length))
end

#split(by, inclusive: false) ⇒ Expr

Split the string by a substring.

Examples:

df = Polars::DataFrame.new({"s" => ["foo bar", "foo-bar", "foo bar baz"]})
df.select(Polars.col("s").str.split(" "))
# =>
# shape: (3, 1)
# ┌───────────────────────┐
# │ s                     │
# │ ---                   │
# │ list[str]             │
# ╞═══════════════════════╡
# │ ["foo", "bar"]        │
# │ ["foo-bar"]           │
# │ ["foo", "bar", "baz"] │
# └───────────────────────┘

Parameters:

  • by (String)

    Substring to split by.

  • inclusive (Boolean) (defaults to: false)

    If true, include the split character/string in the results.

Returns:



894
895
896
897
898
899
900
# File 'lib/polars/string_expr.rb', line 894

def split(by, inclusive: false)
  if inclusive
    Utils.wrap_expr(_rbexpr.str_split_inclusive(by))
  else
    Utils.wrap_expr(_rbexpr.str_split(by))
  end
end

#split_exact(by, n, inclusive: false) ⇒ Expr

Split the string by a substring using n splits.

Results in a struct of n+1 fields.

If it cannot make n splits, the remaining field elements will be null.

Examples:

df = Polars::DataFrame.new({"x" => ["a_1", nil, "c", "d_4"]})
df.select(
  [
    Polars.col("x").str.split_exact("_", 1).alias("fields")
  ]
)
# =>
# shape: (4, 1)
# ┌─────────────┐
# │ fields      │
# │ ---         │
# │ struct[2]   │
# ╞═════════════╡
# │ {"a","1"}   │
# │ {null,null} │
# │ {"c",null}  │
# │ {"d","4"}   │
# └─────────────┘

Parameters:

  • by (String)

    Substring to split by.

  • n (Integer)

    Number of splits to make.

  • inclusive (Boolean) (defaults to: false)

    If true, include the split character/string in the results.

Returns:



936
937
938
939
940
941
942
# File 'lib/polars/string_expr.rb', line 936

def split_exact(by, n, inclusive: false)
  if inclusive
    Utils.wrap_expr(_rbexpr.str_split_exact_inclusive(by, n))
  else
    Utils.wrap_expr(_rbexpr.str_split_exact(by, n))
  end
end

#splitn(by, n) ⇒ Expr

Split the string by a substring, restricted to returning at most n items.

If the number of possible splits is less than n-1, the remaining field elements will be null. If the number of possible splits is n-1 or greater, the last (nth) substring will contain the remainder of the string.

Examples:

df = Polars::DataFrame.new({"s" => ["foo bar", nil, "foo-bar", "foo bar baz"]})
df.select(Polars.col("s").str.splitn(" ", 2).alias("fields"))
# =>
# shape: (4, 1)
# ┌───────────────────┐
# │ fields            │
# │ ---               │
# │ struct[2]         │
# ╞═══════════════════╡
# │ {"foo","bar"}     │
# │ {null,null}       │
# │ {"foo-bar",null}  │
# │ {"foo","bar baz"} │
# └───────────────────┘

Parameters:

  • by (String)

    Substring to split by.

  • n (Integer)

    Max number of items to return.

Returns:



972
973
974
# File 'lib/polars/string_expr.rb', line 972

def splitn(by, n)
  Utils.wrap_expr(_rbexpr.str_splitn(by, n))
end

#starts_with(sub) ⇒ Expr

Check if string values start with a substring.

Examples:

df = Polars::DataFrame.new({"fruits" => ["apple", "mango", nil]})
df.with_column(
  Polars.col("fruits").str.starts_with("app").alias("has_prefix")
)
# =>
# shape: (3, 2)
# ┌────────┬────────────┐
# │ fruits ┆ has_prefix │
# │ ---    ┆ ---        │
# │ str    ┆ bool       │
# ╞════════╪════════════╡
# │ apple  ┆ true       │
# │ mango  ┆ false      │
# │ null   ┆ null       │
# └────────┴────────────┘

Using starts_with as a filter condition:

df.filter(Polars.col("fruits").str.starts_with("app"))
# =>
# shape: (1, 1)
# ┌────────┐
# │ fruits │
# │ ---    │
# │ str    │
# ╞════════╡
# │ apple  │
# └────────┘

Parameters:

  • sub (String)

    Prefix substring.

Returns:



638
639
640
641
# File 'lib/polars/string_expr.rb', line 638

def starts_with(sub)
  sub = Utils.expr_to_lit_or_expr(sub, str_to_lit: true)._rbexpr
  Utils.wrap_expr(_rbexpr.str_starts_with(sub))
end

#strip(matches = nil) ⇒ Expr

Remove leading and trailing whitespace.

Examples:

df = Polars::DataFrame.new({"foo" => [" lead", "trail ", " both "]})
df.select(Polars.col("foo").str.strip)
# =>
# shape: (3, 1)
# ┌───────┐
# │ foo   │
# │ ---   │
# │ str   │
# ╞═══════╡
# │ lead  │
# │ trail │
# │ both  │
# └───────┘

Parameters:

  • matches (String, nil) (defaults to: nil)

    An optional single character that should be trimmed.

Returns:



359
360
361
362
363
364
# File 'lib/polars/string_expr.rb', line 359

def strip(matches = nil)
  if !matches.nil? && matches.length > 1
    raise ArgumentError, "matches should contain a single character"
  end
  Utils.wrap_expr(_rbexpr.str_strip(matches))
end

#strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false) ⇒ Expr

Note:

When parsing a Datetime the column precision will be inferred from the format string, if given, eg: "%F %T%.3f" => Datetime("ms"). If no fractional second component is found then the default is "us".

Parse a Utf8 expression to a Date/Datetime/Time type.

Examples:

Dealing with a consistent format:

s = Polars::Series.new(["2020-01-01 01:00Z", "2020-01-01 02:00Z"])
s.str.strptime(Polars::Datetime, "%Y-%m-%d %H:%M%#z")
# =>
# shape: (2,)
# Series: '' [datetime[μs, UTC]]
# [
#         2020-01-01 01:00:00 UTC
#         2020-01-01 02:00:00 UTC
# ]

Dealing with different formats.

s = Polars::Series.new(
  "date",
  [
    "2021-04-22",
    "2022-01-04 00:00:00",
    "01/31/22",
    "Sun Jul  8 00:34:60 2001",
  ]
)
s.to_frame.select(
  Polars.coalesce(
    Polars.col("date").str.strptime(Polars::Date, "%F", strict: false),
    Polars.col("date").str.strptime(Polars::Date, "%F %T", strict: false),
    Polars.col("date").str.strptime(Polars::Date, "%D", strict: false),
    Polars.col("date").str.strptime(Polars::Date, "%c", strict: false)
  )
).to_series
# =>
# shape: (4,)
# Series: 'date' [date]
# [
#         2021-04-22
#         2022-01-04
#         2022-01-31
#         2001-07-08
# ]

Parameters:

  • dtype (Object)

    The data type to convert into. Can be either Date, Datetime, or Time.

  • format (String) (defaults to: nil)

    Format to use, refer to the chrono strftime documentation for specification. Example: "%y-%m-%d".

  • strict (Boolean) (defaults to: true)

    Raise an error if any conversion fails.

  • exact (Boolean) (defaults to: true)
    • If true, require an exact format match.
    • If false, allow the format to match anywhere in the target string.
  • utc (Boolean) (defaults to: false)

    Parse timezone aware datetimes as UTC. This may be useful if you have data with mixed offsets.

Returns:



192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
# File 'lib/polars/string_expr.rb', line 192

def strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false)
  _validate_format_argument(format)

  if dtype == Date
    to_date(format, strict: strict, exact: exact, cache: cache)
  elsif dtype == Datetime || dtype.is_a?(Datetime)
    dtype = Datetime.new if dtype == Datetime
    time_unit = dtype.time_unit
    time_zone = dtype.time_zone
    to_datetime(format, time_unit: time_unit, time_zone: time_zone, strict: strict, exact: exact, cache: cache)
  elsif dtype == Time
    to_time(format, strict: strict, cache: cache)
  else
    raise ArgumentError, "dtype should be of type {Date, Datetime, Time}"
  end
end

#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ Expr

Convert a Utf8 column into a Date column.

Examples:

s = Polars::Series.new(["2020/01/01", "2020/02/01", "2020/03/01"])
s.str.to_date
# =>
# shape: (3,)
# Series: '' [date]
# [
#         2020-01-01
#         2020-02-01
#         2020-03-01
# ]

Parameters:

  • format (String) (defaults to: nil)

    Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%Y-%m-%d". If set to nil (default), the format is inferred from the data.

  • strict (Boolean) (defaults to: true)

    Raise an error if any conversion fails.

  • exact (Boolean) (defaults to: true)

    Require an exact format match. If false, allow the format to match anywhere in the target string.

  • cache (Boolean) (defaults to: true)

    Use a cache of unique, converted dates to apply the conversion.

Returns:



40
41
42
43
# File 'lib/polars/string_expr.rb', line 40

def to_date(format = nil, strict: true, exact: true, cache: true)
  _validate_format_argument(format)
  Utils.wrap_expr(self._rbexpr.str_to_date(format, strict, exact, cache))
end

#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true) ⇒ Expr

Convert a Utf8 column into a Datetime column.

Examples:

s = Polars::Series.new(["2020-01-01 01:00Z", "2020-01-01 02:00Z"])
s.str.to_datetime("%Y-%m-%d %H:%M%#z")
# =>
# shape: (2,)
# Series: '' [datetime[μs, UTC]]
# [
#         2020-01-01 01:00:00 UTC
#         2020-01-01 02:00:00 UTC
# ]

Parameters:

  • format (String) (defaults to: nil)

    Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%Y-%m-%d %H:%M:%S". If set to nil (default), the format is inferred from the data.

  • time_unit ("us", "ns", "ms") (defaults to: nil)

    Unit of time for the resulting Datetime column. If set to nil (default), the time unit is inferred from the format string if given, eg: "%F %T%.3f" => Datetime("ms"). If no fractional second component is found, the default is "us".

  • time_zone (String) (defaults to: nil)

    Time zone for the resulting Datetime column.

  • strict (Boolean) (defaults to: true)

    Raise an error if any conversion fails.

  • exact (Boolean) (defaults to: true)

    Require an exact format match. If false, allow the format to match anywhere in the target string.

  • cache (Boolean) (defaults to: true)

    Use a cache of unique, converted datetimes to apply the conversion.

Returns:



79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# File 'lib/polars/string_expr.rb', line 79

def to_datetime(
  format = nil,
  time_unit: nil,
  time_zone: nil,
  strict: true,
  exact: true,
  cache: true
)
  _validate_format_argument(format)
  Utils.wrap_expr(
    self._rbexpr.str_to_datetime(
      format,
      time_unit,
      time_zone,
      strict,
      exact,
      cache
    )
  )
end

#to_lowercaseExpr

Transform to lowercase variant.

Examples:

df = Polars::DataFrame.new({"foo" => ["CAT", "DOG"]})
df.select(Polars.col("foo").str.to_lowercase)
# =>
# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ cat │
# │ dog │
# └─────┘

Returns:



334
335
336
# File 'lib/polars/string_expr.rb', line 334

def to_lowercase
  Utils.wrap_expr(_rbexpr.str_to_lowercase)
end

#to_time(format = nil, strict: true, cache: true) ⇒ Expr

Convert a Utf8 column into a Time column.

Examples:

s = Polars::Series.new(["01:00", "02:00", "03:00"])
s.str.to_time("%H:%M")
# =>
# shape: (3,)
# Series: '' [time]
# [
#         01:00:00
#         02:00:00
#         03:00:00
# ]

Parameters:

  • format (String) (defaults to: nil)

    Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%H:%M:%S". If set to nil (default), the format is inferred from the data.

  • strict (Boolean) (defaults to: true)

    Raise an error if any conversion fails.

  • cache (Boolean) (defaults to: true)

    Use a cache of unique, converted times to apply the conversion.

Returns:



125
126
127
128
# File 'lib/polars/string_expr.rb', line 125

def to_time(format = nil, strict: true, cache: true)
  _validate_format_argument(format)
  Utils.wrap_expr(_rbexpr.str_to_time(format, strict, cache))
end

#to_uppercaseExpr

Transform to uppercase variant.

Examples:

df = Polars::DataFrame.new({"foo" => ["cat", "dog"]})
df.select(Polars.col("foo").str.to_uppercase)
# =>
# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ CAT │
# │ DOG │
# └─────┘

Returns:



313
314
315
# File 'lib/polars/string_expr.rb', line 313

def to_uppercase
  Utils.wrap_expr(_rbexpr.str_to_uppercase)
end

#zfill(alignment) ⇒ Expr

Fills the string with zeroes.

Return a copy of the string left filled with ASCII '0' digits to make a string of length width.

A leading sign prefix ('+'/'-') is handled by inserting the padding after the sign character rather than before. The original string is returned if width is less than or equal to s.length.

Examples:

df = Polars::DataFrame.new(
  {
    "num" => [-10, -1, 0, 1, 10, 100, 1000, 10000, 100000, 1000000, nil]
  }
)
df.with_column(Polars.col("num").cast(String).str.zfill(5))
# =>
# shape: (11, 1)
# ┌─────────┐
# │ num     │
# │ ---     │
# │ str     │
# ╞═════════╡
# │ -0010   │
# │ -0001   │
# │ 00000   │
# │ 00001   │
# │ …       │
# │ 10000   │
# │ 100000  │
# │ 1000000 │
# │ null    │
# └─────────┘

Parameters:

  • alignment (Integer)

    Fill the value up to this length

Returns:



460
461
462
# File 'lib/polars/string_expr.rb', line 460

def zfill(alignment)
  Utils.wrap_expr(_rbexpr.str_zfill(alignment))
end