Class: Polars::StringExpr

Inherits:

Object

Object
Polars::StringExpr

show all

Defined in:: lib/polars/string_expr.rb

Overview

Namespace for string related expressions.

Instance Method Summary collapse

#concat(delimiter = "-") ⇒ Expr
Vertically concat the values in the Series to a single string value.
#contains(pattern, literal: false) ⇒ Expr
Check if string contains a substring that matches a regex.
#count_match(pattern) ⇒ Expr
Count all successive non-overlapping regex matches.
#decode(encoding, strict: false) ⇒ Expr
Decode a value using the provided encoding.
#encode(encoding) ⇒ Expr
Encode a value using the provided encoding.
#ends_with(sub) ⇒ Expr
Check if string values end with a substring.
#extract(pattern, group_index: 1) ⇒ Expr
Extract the target capture group from provided patterns.
#extract_all(pattern) ⇒ Expr
Extracts all matches for the given regex pattern.
#json_path_match(json_path) ⇒ Expr
Extract the first match of json string with provided JSONPath expression.
#lengths ⇒ Expr
Get length of the strings as :u32 (as number of bytes).
#ljust(width, fillchar = " ") ⇒ Expr
Return the string left justified in a string of length width.
#lstrip(matches = nil) ⇒ Expr
Remove leading whitespace.
#n_chars ⇒ Expr
Get length of the strings as :u32 (as number of chars).
#replace(pattern, value, literal: false) ⇒ Expr
Replace first matching regex/literal substring with a new string value.
#replace_all(pattern, value, literal: false) ⇒ Expr
Replace all matching regex/literal substrings with a new string value.
#rjust(width, fillchar = " ") ⇒ Expr
Return the string right justified in a string of length width.
#rstrip(matches = nil) ⇒ Expr
Remove trailing whitespace.
#slice(offset, length = nil) ⇒ Expr
Create subslices of the string values of a Utf8 Series.
#split(by, inclusive: false) ⇒ Expr
Split the string by a substring.
#split_exact(by, n, inclusive: false) ⇒ Expr
Split the string by a substring using n splits.
#splitn(by, n) ⇒ Expr
Split the string by a substring, restricted to returning at most n items.
#starts_with(sub) ⇒ Expr
Check if string values start with a substring.
#strip(matches = nil) ⇒ Expr
Remove leading and trailing whitespace.
#strptime(datatype, fmt = nil, strict: true, exact: true, cache: true, tz_aware: false) ⇒ Expr
Parse a Utf8 expression to a Date/Datetime/Time type.
#to_lowercase ⇒ Expr
Transform to lowercase variant.
#to_uppercase ⇒ Expr
Transform to uppercase variant.
#zfill(alignment) ⇒ Expr
Fills the string with zeroes.

Instance Method Details

#concat(delimiter = "-") ⇒ `Expr`

Vertically concat the values in the Series to a single string value.

Examples:

df = Polars::DataFrame.new({"foo" => [1, nil, 2]})
df.select(Polars.col("foo").str.concat("-"))
# =>
# shape: (1, 1)
# ┌──────────┐
# │ foo      │
# │ ---      │
# │ str      │
# ╞══════════╡
# │ 1-null-2 │
# └──────────┘

Parameters:

delimiter (String) (defaults to: "-") —
The delimiter to insert between consecutive string values.

Returns:

(Expr)



179
180
181

# File 'lib/polars/string_expr.rb', line 179

def concat(delimiter = "-")
  Utils.wrap_expr(_rbexpr.str_concat(delimiter))
end

#contains(pattern, literal: false) ⇒ `Expr`

Check if string contains a substring that matches a regex.

Examples:

df = Polars::DataFrame.new({"a" => ["Crab", "cat and dog", "rab$bit", nil]})
df.select(
  [
    Polars.col("a"),
    Polars.col("a").str.contains("cat|bit").alias("regex"),
    Polars.col("a").str.contains("rab$", literal: true).alias("literal")
  ]
)
# =>
# shape: (4, 3)
# ┌─────────────┬───────┬─────────┐
# │ a           ┆ regex ┆ literal │
# │ ---         ┆ ---   ┆ ---     │
# │ str         ┆ bool  ┆ bool    │
# ╞═════════════╪═══════╪═════════╡
# │ Crab        ┆ false ┆ false   │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
# │ cat and dog ┆ true  ┆ false   │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
# │ rab$bit     ┆ true  ┆ true    │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
# │ null        ┆ null  ┆ null    │
# └─────────────┴───────┴─────────┘

Parameters:

pattern (String) —
A valid regex pattern.
literal (Boolean) (defaults to: false) —
Treat pattern as a literal string.

Returns:

(Expr)



470
471
472

# File 'lib/polars/string_expr.rb', line 470

def contains(pattern, literal: false)
  Utils.wrap_expr(_rbexpr.str_contains(pattern, literal))
end

#count_match(pattern) ⇒ `Expr`

Count all successive non-overlapping regex matches.

Examples:

df = Polars::DataFrame.new({"foo" => ["123 bla 45 asd", "xyz 678 910t"]})
df.select(
  [
    Polars.col("foo").str.count_match('\d').alias("count_digits")
  ]
)
# =>
# shape: (2, 1)
# ┌──────────────┐
# │ count_digits │
# │ ---          │
# │ u32          │
# ╞══════════════╡
# │ 5            │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 6            │
# └──────────────┘

Parameters:

pattern (String) —
A valid regex pattern

Returns:

(Expr)



757
758
759

# File 'lib/polars/string_expr.rb', line 757

def count_match(pattern)
  Utils.wrap_expr(_rbexpr.count_match(pattern))
end

#decode(encoding, strict: false) ⇒ `Expr`

Decode a value using the provided encoding.

Examples:

df = Polars::DataFrame.new({"encoded" => ["666f6f", "626172", nil]})
df.select(Polars.col("encoded").str.decode("hex"))
# =>
# shape: (3, 1)
# ┌─────────┐
# │ encoded │
# │ ---     │
# │ str     │
# ╞═════════╡
# │ foo     │
# ├╌╌╌╌╌╌╌╌╌┤
# │ bar     │
# ├╌╌╌╌╌╌╌╌╌┤
# │ null    │
# └─────────┘

Parameters:

encoding ("hex", "base64") —
The encoding to use.
strict (Boolean) (defaults to: false) —
How to handle invalid inputs:
- true: An error will be thrown if unable to decode a value.
- false: Unhandled values will be replaced with nil.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 623

def decode(encoding, strict: false)
  if encoding == "hex"
    Utils.wrap_expr(_rbexpr.str_hex_decode(strict))
  elsif encoding == "base64"
    Utils.wrap_expr(_rbexpr.str_base64_decode(strict))
  else
    raise ArgumentError, "encoding must be one of {{'hex', 'base64'}}, got #{encoding}"
  end
end

#encode(encoding) ⇒ `Expr`

Encode a value using the provided encoding.

Examples:

df = Polars::DataFrame.new({"strings" => ["foo", "bar", nil]})
df.select(Polars.col("strings").str.encode("hex"))
# =>
# shape: (3, 1)
# ┌─────────┐
# │ strings │
# │ ---     │
# │ str     │
# ╞═════════╡
# │ 666f6f  │
# ├╌╌╌╌╌╌╌╌╌┤
# │ 626172  │
# ├╌╌╌╌╌╌╌╌╌┤
# │ null    │
# └─────────┘

Parameters:

encoding ("hex", "base64") —
The encoding to use.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 656

def encode(encoding)
  if encoding == "hex"
    Utils.wrap_expr(_rbexpr.str_hex_encode)
  elsif encoding == "base64"
    Utils.wrap_expr(_rbexpr.str_base64_encode)
  else
    raise ArgumentError, "encoding must be one of {{'hex', 'base64'}}, got #{encoding}"
  end
end

#ends_with(sub) ⇒ `Expr`

Check if string values end with a substring.

Examples:

df = Polars::DataFrame.new({"fruits" => ["apple", "mango", nil]})
df.with_column(
  Polars.col("fruits").str.ends_with("go").alias("has_suffix")
)
# =>
# shape: (3, 2)
# ┌────────┬────────────┐
# │ fruits ┆ has_suffix │
# │ ---    ┆ ---        │
# │ str    ┆ bool       │
# ╞════════╪════════════╡
# │ apple  ┆ false      │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ mango  ┆ true       │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ null   ┆ null       │
# └────────┴────────────┘

Using `ends_with` as a filter condition:

df.filter(Polars.col("fruits").str.ends_with("go"))
# =>
# shape: (1, 1)
# ┌────────┐
# │ fruits │
# │ ---    │
# │ str    │
# ╞════════╡
# │ mango  │
# └────────┘

Parameters:

sub (String) —
Suffix substring.

Returns:

(Expr)



511
512
513

# File 'lib/polars/string_expr.rb', line 511

def ends_with(sub)
  Utils.wrap_expr(_rbexpr.str_ends_with(sub))
end

#extract(pattern, group_index: 1) ⇒ `Expr`

Extract the target capture group from provided patterns.

Examples:

df = Polars::DataFrame.new({"foo" => ["123 bla 45 asd", "xyz 678 910t"]})
df.select(
  [
    Polars.col("foo").str.extract('(\d+)')
  ]
)
# =>
# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ 123 │
# ├╌╌╌╌╌┤
# │ 678 │
# └─────┘

Parameters:

pattern (String) —
A valid regex pattern
group_index (Integer) (defaults to: 1) —
Index of the targeted capture group. Group 0 mean the whole pattern, first group begin at index 1 Default to the first capture group

Returns:

(Expr)



695
696
697

# File 'lib/polars/string_expr.rb', line 695

def extract(pattern, group_index: 1)
  Utils.wrap_expr(_rbexpr.str_extract(pattern, group_index))
end

#extract_all(pattern) ⇒ `Expr`

Extracts all matches for the given regex pattern.

Extracts each successive non-overlapping regex match in an individual string as an array.

Examples:

df = Polars::DataFrame.new({"foo" => ["123 bla 45 asd", "xyz 678 910t"]})
df.select(
  [
    Polars.col("foo").str.extract_all('(\d+)').alias("extracted_nrs")
  ]
)
# =>
# shape: (2, 1)
# ┌────────────────┐
# │ extracted_nrs  │
# │ ---            │
# │ list[str]      │
# ╞════════════════╡
# │ ["123", "45"]  │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ ["678", "910"] │
# └────────────────┘

Parameters:

pattern (String) —
A valid regex pattern

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 727

def extract_all(pattern)
  pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true)
  Utils.wrap_expr(_rbexpr.str_extract_all(pattern._rbexpr))
end

#json_path_match(json_path) ⇒ `Expr`

Extract the first match of json string with provided JSONPath expression.

Throw errors if encounter invalid json strings. All return value will be casted to Utf8 regardless of the original value.

Documentation on JSONPath standard can be found here.

Examples:

df = Polars::DataFrame.new(
  {"json_val" => ['{"a":"1"}', nil, '{"a":2}', '{"a":2.1}', '{"a":true}']}
)
df.select(Polars.col("json_val").str.json_path_match("$.a"))
# =>
# shape: (5, 1)
# ┌──────────┐
# │ json_val │
# │ ---      │
# │ str      │
# ╞══════════╡
# │ 1        │
# ├╌╌╌╌╌╌╌╌╌╌┤
# │ null     │
# ├╌╌╌╌╌╌╌╌╌╌┤
# │ 2        │
# ├╌╌╌╌╌╌╌╌╌╌┤
# │ 2.1      │
# ├╌╌╌╌╌╌╌╌╌╌┤
# │ true     │
# └──────────┘

Parameters:

json_path (String) —
A valid JSON path query string.

Returns:

(Expr)



591
592
593

# File 'lib/polars/string_expr.rb', line 591

def json_path_match(json_path)
  Utils.wrap_expr(_rbexpr.str_json_path_match(json_path))
end

#lengths ⇒ `Expr`

Note:

The returned lengths are equal to the number of bytes in the UTF8 string. If you need the length in terms of the number of characters, use n_chars instead.

Get length of the strings as :u32 (as number of bytes).

Examples:

df = Polars::DataFrame.new({"s" => ["Café", nil, "345", "東京"]}).with_columns(
  [
    Polars.col("s").str.lengths.alias("length"),
    Polars.col("s").str.n_chars.alias("nchars")
  ]
)
df
# =>
# shape: (4, 3)
# ┌──────┬────────┬────────┐
# │ s    ┆ length ┆ nchars │
# │ ---  ┆ ---    ┆ ---    │
# │ str  ┆ u32    ┆ u32    │
# ╞══════╪════════╪════════╡
# │ Café ┆ 5      ┆ 4      │
# ├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
# │ null ┆ null   ┆ null   │
# ├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
# │ 345  ┆ 3      ┆ 3      │
# ├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
# │ 東京 ┆ 6      ┆ 2      │
# └──────┴────────┴────────┘

Returns:

(Expr)



121
122
123

# File 'lib/polars/string_expr.rb', line 121

def lengths
  Utils.wrap_expr(_rbexpr.str_lengths)
end

#ljust(width, fillchar = " ") ⇒ `Expr`

Return the string left justified in a string of length width.

Padding is done using the specified fillchar. The original string is returned if width is less than or equal to s.length.

Examples:

df = Polars::DataFrame.new({"a" => ["cow", "monkey", nil, "hippopotamus"]})
df.select(Polars.col("a").str.ljust(8, "*"))
# =>
# shape: (4, 1)
# ┌──────────────┐
# │ a            │
# │ ---          │
# │ str          │
# ╞══════════════╡
# │ cow*****     │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ monkey**     │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ null         │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ hippopotamus │
# └──────────────┘

Parameters:

width (Integer) —
Justify left to this length.
fillchar (String) (defaults to: " ") —
Fill with this ASCII character.

Returns:

(Expr)



398
399
400

# File 'lib/polars/string_expr.rb', line 398

def ljust(width, fillchar = " ")
  Utils.wrap_expr(_rbexpr.str_ljust(width, fillchar))
end

#lstrip(matches = nil) ⇒ `Expr`

Remove leading whitespace.

Examples:

df = Polars::DataFrame.new({"foo" => [" lead", "trail ", " both "]})
df.select(Polars.col("foo").str.lstrip)
# =>
# shape: (3, 1)
# ┌────────┐
# │ foo    │
# │ ---    │
# │ str    │
# ╞════════╡
# │ lead   │
# ├╌╌╌╌╌╌╌╌┤
# │ trail  │
# ├╌╌╌╌╌╌╌╌┤
# │ both   │
# └────────┘

Parameters:

matches (String, nil) (defaults to: nil) —
An optional single character that should be trimmed.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 280

def lstrip(matches = nil)
  if !matches.nil? && matches.length > 1
    raise ArgumentError, "matches should contain a single character"
  end
  Utils.wrap_expr(_rbexpr.str_lstrip(matches))
end

#n_chars ⇒ `Expr`

Note:

If you know that you are working with ASCII text, lengths will be equivalent, and faster (returns length in terms of the number of bytes).

Get length of the strings as :u32 (as number of chars).

Examples:

df = Polars::DataFrame.new({"s" => ["Café", nil, "345", "東京"]}).with_columns(
  [
    Polars.col("s").str.lengths.alias("length"),
    Polars.col("s").str.n_chars.alias("nchars")
  ]
)
df
# =>
# shape: (4, 3)
# ┌──────┬────────┬────────┐
# │ s    ┆ length ┆ nchars │
# │ ---  ┆ ---    ┆ ---    │
# │ str  ┆ u32    ┆ u32    │
# ╞══════╪════════╪════════╡
# │ Café ┆ 5      ┆ 4      │
# ├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
# │ null ┆ null   ┆ null   │
# ├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
# │ 345  ┆ 3      ┆ 3      │
# ├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
# │ 東京 ┆ 6      ┆ 2      │
# └──────┴────────┴────────┘

Returns:

(Expr)



156
157
158

# File 'lib/polars/string_expr.rb', line 156

def n_chars
  Utils.wrap_expr(_rbexpr.str_n_chars)
end

#replace(pattern, value, literal: false) ⇒ `Expr`

Replace first matching regex/literal substring with a new string value.

Examples:

df = Polars::DataFrame.new({"id" => [1, 2], "text" => ["123abc", "abc456"]})
df.with_column(
  Polars.col("text").str.replace('abc\b', "ABC")
)
# =>
# shape: (2, 2)
# ┌─────┬────────┐
# │ id  ┆ text   │
# │ --- ┆ ---    │
# │ i64 ┆ str    │
# ╞═════╪════════╡
# │ 1   ┆ 123ABC │
# ├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
# │ 2   ┆ abc456 │
# └─────┴────────┘

Parameters:

pattern (String) —
Regex pattern.
value (String) —
Replacement string.
literal (Boolean) (defaults to: false) —
Treat pattern as a literal string.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 901

def replace(pattern, value, literal: false)
  pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true)
  value = Utils.expr_to_lit_or_expr(value, str_to_lit: true)
  Utils.wrap_expr(_rbexpr.str_replace(pattern._rbexpr, value._rbexpr, literal))
end

#replace_all(pattern, value, literal: false) ⇒ `Expr`

Replace all matching regex/literal substrings with a new string value.

Examples:

df = Polars::DataFrame.new({"id" => [1, 2], "text" => ["abcabc", "123a123"]})
df.with_column(Polars.col("text").str.replace_all("a", "-"))
# =>
# shape: (2, 2)
# ┌─────┬─────────┐
# │ id  ┆ text    │
# │ --- ┆ ---     │
# │ i64 ┆ str     │
# ╞═════╪═════════╡
# │ 1   ┆ -bc-bc  │
# ├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
# │ 2   ┆ 123-123 │
# └─────┴─────────┘

Parameters:

pattern (String) —
Regex pattern.
value (String) —
Replacement string.
literal (Boolean) (defaults to: false) —
Treat pattern as a literal string.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 932

def replace_all(pattern, value, literal: false)
  pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true)
  value = Utils.expr_to_lit_or_expr(value, str_to_lit: true)
  Utils.wrap_expr(_rbexpr.str_replace_all(pattern._rbexpr, value._rbexpr, literal))
end

#rjust(width, fillchar = " ") ⇒ `Expr`

Return the string right justified in a string of length width.

Padding is done using the specified fillchar. The original string is returned if width is less than or equal to s.length.

Examples:

df = Polars::DataFrame.new({"a" => ["cow", "monkey", nil, "hippopotamus"]})
df.select(Polars.col("a").str.rjust(8, "*"))
# =>
# shape: (4, 1)
# ┌──────────────┐
# │ a            │
# │ ---          │
# │ str          │
# ╞══════════════╡
# │ *****cow     │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ **monkey     │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ null         │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ hippopotamus │
# └──────────────┘

Parameters:

width (Integer) —
Justify right to this length.
fillchar (String) (defaults to: " ") —
Fill with this ASCII character.

Returns:

(Expr)



433
434
435

# File 'lib/polars/string_expr.rb', line 433

def rjust(width, fillchar = " ")
  Utils.wrap_expr(_rbexpr.str_rjust(width, fillchar))
end

#rstrip(matches = nil) ⇒ `Expr`

Remove trailing whitespace.

Examples:

df = Polars::DataFrame.new({"foo" => [" lead", "trail ", " both "]})
df.select(Polars.col("foo").str.rstrip)
# =>
# shape: (3, 1)
# ┌───────┐
# │ foo   │
# │ ---   │
# │ str   │
# ╞═══════╡
# │  lead │
# ├╌╌╌╌╌╌╌┤
# │ trail │
# ├╌╌╌╌╌╌╌┤
# │  both │
# └───────┘

Parameters:

matches (String, nil) (defaults to: nil) —
An optional single character that should be trimmed.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 310

def rstrip(matches = nil)
  if !matches.nil? && matches.length > 1
    raise ArgumentError, "matches should contain a single character"
  end
  Utils.wrap_expr(_rbexpr.str_rstrip(matches))
end

#slice(offset, length = nil) ⇒ `Expr`

Create subslices of the string values of a Utf8 Series.

Examples:

df = Polars::DataFrame.new({"s" => ["pear", nil, "papaya", "dragonfruit"]})
df.with_column(
  Polars.col("s").str.slice(-3).alias("s_sliced")
)
# =>
# shape: (4, 2)
# ┌─────────────┬──────────┐
# │ s           ┆ s_sliced │
# │ ---         ┆ ---      │
# │ str         ┆ str      │
# ╞═════════════╪══════════╡
# │ pear        ┆ ear      │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
# │ null        ┆ null     │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
# │ papaya      ┆ aya      │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
# │ dragonfruit ┆ uit      │
# └─────────────┴──────────┘

Parameters:

offset (Integer) —
Start index. Negative indexing is supported.
length (Integer) (defaults to: nil) —
Length of the slice. If set to nil (default), the slice is taken to the end of the string.

Returns:

(Expr)



968
969
970

# File 'lib/polars/string_expr.rb', line 968

def slice(offset, length = nil)
  Utils.wrap_expr(_rbexpr.str_slice(offset, length))
end

#split(by, inclusive: false) ⇒ `Expr`

Split the string by a substring.

Examples:

df = Polars::DataFrame.new({"s" => ["foo bar", "foo-bar", "foo bar baz"]})
df.select(Polars.col("s").str.split(" "))
# =>
# shape: (3, 1)
# ┌───────────────────────┐
# │ s                     │
# │ ---                   │
# │ list[str]             │
# ╞═══════════════════════╡
# │ ["foo", "bar"]        │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ ["foo-bar"]           │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ ["foo", "bar", "baz"] │
# └───────────────────────┘

Parameters:

by (String) —
Substring to split by.
inclusive (Boolean) (defaults to: false) —
If true, include the split character/string in the results.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 786

def split(by, inclusive: false)
  if inclusive
    Utils.wrap_expr(_rbexpr.str_split_inclusive(by))
  else
    Utils.wrap_expr(_rbexpr.str_split(by))
  end
end

#split_exact(by, n, inclusive: false) ⇒ `Expr`

Split the string by a substring using n splits.

Results in a struct of n+1 fields.

If it cannot make n splits, the remaining field elements will be null.

Examples:

df = Polars::DataFrame.new({"x" => ["a_1", nil, "c", "d_4"]})
df.select(
  [
    Polars.col("x").str.split_exact("_", 1).alias("fields")
  ]
)
# =>
# shape: (4, 1)
# ┌─────────────┐
# │ fields      │
# │ ---         │
# │ struct[2]   │
# ╞═════════════╡
# │ {"a","1"}   │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ {null,null} │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ {"c",null}  │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ {"d","4"}   │
# └─────────────┘

Parameters:

by (String) —
Substring to split by.
n (Integer) —
Number of splits to make.
inclusive (Boolean) (defaults to: false) —
If true, include the split character/string in the results.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 831

def split_exact(by, n, inclusive: false)
  if inclusive
    Utils.wrap_expr(_rbexpr.str_split_exact_inclusive(by, n))
  else
    Utils.wrap_expr(_rbexpr.str_split_exact(by, n))
  end
end

#splitn(by, n) ⇒ `Expr`

Split the string by a substring, restricted to returning at most n items.

If the number of possible splits is less than n-1, the remaining field elements will be null. If the number of possible splits is n-1 or greater, the last (nth) substring will contain the remainder of the string.

Examples:

df = Polars::DataFrame.new({"s" => ["foo bar", nil, "foo-bar", "foo bar baz"]})
df.select(Polars.col("s").str.splitn(" ", 2).alias("fields"))
# =>
# shape: (4, 1)
# ┌───────────────────┐
# │ fields            │
# │ ---               │
# │ struct[2]         │
# ╞═══════════════════╡
# │ {"foo","bar"}     │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ {null,null}       │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ {"foo-bar",null}  │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ {"foo","bar baz"} │
# └───────────────────┘

Parameters:

by (String) —
Substring to split by.
n (Integer) —
Max number of items to return.

Returns:

(Expr)



870
871
872

# File 'lib/polars/string_expr.rb', line 870

def splitn(by, n)
  Utils.wrap_expr(_rbexpr.str_splitn(by, n))
end

#starts_with(sub) ⇒ `Expr`

Check if string values start with a substring.

Examples:

df = Polars::DataFrame.new({"fruits" => ["apple", "mango", nil]})
df.with_column(
  Polars.col("fruits").str.starts_with("app").alias("has_prefix")
)
# =>
# shape: (3, 2)
# ┌────────┬────────────┐
# │ fruits ┆ has_prefix │
# │ ---    ┆ ---        │
# │ str    ┆ bool       │
# ╞════════╪════════════╡
# │ apple  ┆ true       │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ mango  ┆ false      │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ null   ┆ null       │
# └────────┴────────────┘

Using `starts_with` as a filter condition:

df.filter(Polars.col("fruits").str.starts_with("app"))
# =>
# shape: (1, 1)
# ┌────────┐
# │ fruits │
# │ ---    │
# │ str    │
# ╞════════╡
# │ apple  │
# └────────┘

Parameters:

sub (String) —
Prefix substring.

Returns:

(Expr)



552
553
554

# File 'lib/polars/string_expr.rb', line 552

def starts_with(sub)
  Utils.wrap_expr(_rbexpr.str_starts_with(sub))
end

#strip(matches = nil) ⇒ `Expr`

Remove leading and trailing whitespace.

Examples:

df = Polars::DataFrame.new({"foo" => [" lead", "trail ", " both "]})
df.select(Polars.col("foo").str.strip)
# =>
# shape: (3, 1)
# ┌───────┐
# │ foo   │
# │ ---   │
# │ str   │
# ╞═══════╡
# │ lead  │
# ├╌╌╌╌╌╌╌┤
# │ trail │
# ├╌╌╌╌╌╌╌┤
# │ both  │
# └───────┘

Parameters:

matches (String, nil) (defaults to: nil) —
An optional single character that should be trimmed.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 250

def strip(matches = nil)
  if !matches.nil? && matches.length > 1
    raise ArgumentError, "matches should contain a single character"
  end
  Utils.wrap_expr(_rbexpr.str_strip(matches))
end

#strptime(datatype, fmt = nil, strict: true, exact: true, cache: true, tz_aware: false) ⇒ `Expr`

Note:

When parsing a Datetime the column precision will be inferred from the format string, if given, eg: "%F %T%.3f" => Datetime("ms"). If no fractional second component is found then the default is "us".

Parse a Utf8 expression to a Date/Datetime/Time type.

Examples:

s = Polars::Series.new(
  "date",
  [
    "2021-04-22",
    "2022-01-04 00:00:00",
    "01/31/22",
    "Sun Jul  8 00:34:60 2001"
  ]
)
s.to_frame.with_column(
  Polars.col("date")
    .str.strptime(:date, "%F", strict: false)
    .fill_null(
      Polars.col("date").str.strptime(:date, "%F %T", strict: false)
    )
    .fill_null(Polars.col("date").str.strptime(:date, "%D", strict: false))
    .fill_null(Polars.col("date").str.strptime(:date, "%c", strict: false))
)
# =>
# shape: (4, 1)
# ┌────────────┐
# │ date       │
# │ ---        │
# │ date       │
# ╞════════════╡
# │ 2021-04-22 │
# ├╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 2022-01-04 │
# ├╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 2022-01-31 │
# ├╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 2001-07-08 │
# └────────────┘

Parameters:

datatype (Symbol) —
:date, :dateime, or :time.
fmt (String) (defaults to: nil) —
Format to use, refer to the chrono strftime documentation for specification. Example: "%y-%m-%d".
strict (Boolean) (defaults to: true) —
Raise an error if any conversion fails.
exact (Boolean) (defaults to: true) —
- If true, require an exact format match.
- If false, allow the format to match anywhere in the target string.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 67

def strptime(datatype, fmt = nil, strict: true, exact: true, cache: true, tz_aware: false)
  if !Utils.is_polars_dtype(datatype)
    raise ArgumentError, "expected: {DataType} got: #{datatype}"
  end

  if datatype == :date
    Utils.wrap_expr(_rbexpr.str_parse_date(fmt, strict, exact, cache))
  elsif datatype == :datetime
    # TODO fix
    tu = nil # datatype.tu
    dtcol = Utils.wrap_expr(_rbexpr.str_parse_datetime(fmt, strict, exact, cache, tz_aware))
    if tu.nil?
      dtcol
    else
      dtcol.dt.cast_time_unit(tu)
    end
  elsif datatype == :time
    Utils.wrap_expr(_rbexpr.str_parse_time(fmt, strict, exact, cache))
  else
    raise ArgumentError, "dtype should be of type :date, :datetime, or :time"
  end
end

#to_lowercase ⇒ `Expr`

Transform to lowercase variant.

Examples:

df = Polars::DataFrame.new({"foo" => ["CAT", "DOG"]})
df.select(Polars.col("foo").str.to_lowercase)
# =>
# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ cat │
# ├╌╌╌╌╌┤
# │ dog │
# └─────┘

Returns:

(Expr)



223
224
225

# File 'lib/polars/string_expr.rb', line 223

def to_lowercase
  Utils.wrap_expr(_rbexpr.str_to_lowercase)
end

#to_uppercase ⇒ `Expr`

Transform to uppercase variant.

Examples:

df = Polars::DataFrame.new({"foo" => ["cat", "dog"]})
df.select(Polars.col("foo").str.to_uppercase)
# =>
# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ CAT │
# ├╌╌╌╌╌┤
# │ DOG │
# └─────┘

Returns:

(Expr)



201
202
203

# File 'lib/polars/string_expr.rb', line 201

def to_uppercase
  Utils.wrap_expr(_rbexpr.str_to_uppercase)
end

#zfill(alignment) ⇒ `Expr`

Fills the string with zeroes.

Return a copy of the string left filled with ASCII '0' digits to make a string of length width.

A leading sign prefix ('+'/'-') is handled by inserting the padding after the sign character rather than before. The original string is returned if width is less than or equal to s.length.

Examples:

df = Polars::DataFrame.new(
  {
    "num" => [-10, -1, 0, 1, 10, 100, 1000, 10000, 100000, 1000000, nil]
  }
)
df.with_column(Polars.col("num").cast(String).str.zfill(5))
# =>
# shape: (11, 1)
# ┌─────────┐
# │ num     │
# │ ---     │
# │ str     │
# ╞═════════╡
# │ -0010   │
# ├╌╌╌╌╌╌╌╌╌┤
# │ -0001   │
# ├╌╌╌╌╌╌╌╌╌┤
# │ 00000   │
# ├╌╌╌╌╌╌╌╌╌┤
# │ 00001   │
# ├╌╌╌╌╌╌╌╌╌┤
# │ ...     │
# ├╌╌╌╌╌╌╌╌╌┤
# │ 10000   │
# ├╌╌╌╌╌╌╌╌╌┤
# │ 100000  │
# ├╌╌╌╌╌╌╌╌╌┤
# │ 1000000 │
# ├╌╌╌╌╌╌╌╌╌┤
# │ null    │
# └─────────┘

Parameters:

alignment (Integer) —
Fill the value up to this length

Returns:

(Expr)



363
364
365

# File 'lib/polars/string_expr.rb', line 363

def zfill(alignment)
  Utils.wrap_expr(_rbexpr.str_zfill(alignment))
end

Class: Polars::StringExpr

Overview

Instance Method Summary collapse

Instance Method Details

#concat(delimiter = "-") ⇒ Expr

Examples:

#contains(pattern, literal: false) ⇒ Expr

Examples:

#count_match(pattern) ⇒ Expr

Examples:

#decode(encoding, strict: false) ⇒ Expr

Examples:

#encode(encoding) ⇒ Expr

Examples:

#ends_with(sub) ⇒ Expr

Examples:

Using ends_with as a filter condition:

#extract(pattern, group_index: 1) ⇒ Expr

Examples:

#extract_all(pattern) ⇒ Expr

Examples:

#json_path_match(json_path) ⇒ Expr

Examples:

#lengths ⇒ Expr

Examples:

#ljust(width, fillchar = " ") ⇒ Expr

Examples:

#lstrip(matches = nil) ⇒ Expr

Examples:

#n_chars ⇒ Expr

Examples:

#replace(pattern, value, literal: false) ⇒ Expr

Examples:

#replace_all(pattern, value, literal: false) ⇒ Expr

Examples:

#rjust(width, fillchar = " ") ⇒ Expr

Examples:

#rstrip(matches = nil) ⇒ Expr

Examples:

#slice(offset, length = nil) ⇒ Expr

Examples:

#split(by, inclusive: false) ⇒ Expr

Examples:

#split_exact(by, n, inclusive: false) ⇒ Expr

Examples:

#splitn(by, n) ⇒ Expr

Examples:

#starts_with(sub) ⇒ Expr

Examples:

Using starts_with as a filter condition:

#strip(matches = nil) ⇒ Expr

Examples:

#strptime(datatype, fmt = nil, strict: true, exact: true, cache: true, tz_aware: false) ⇒ Expr

Examples:

#to_lowercase ⇒ Expr

Examples:

#to_uppercase ⇒ Expr

Examples:

#zfill(alignment) ⇒ Expr

Examples:

#concat(delimiter = "-") ⇒ `Expr`

#contains(pattern, literal: false) ⇒ `Expr`

#count_match(pattern) ⇒ `Expr`

#decode(encoding, strict: false) ⇒ `Expr`

#encode(encoding) ⇒ `Expr`

#ends_with(sub) ⇒ `Expr`

Using `ends_with` as a filter condition:

#extract(pattern, group_index: 1) ⇒ `Expr`

#extract_all(pattern) ⇒ `Expr`

#json_path_match(json_path) ⇒ `Expr`

#lengths ⇒ `Expr`

#ljust(width, fillchar = " ") ⇒ `Expr`

#lstrip(matches = nil) ⇒ `Expr`

#n_chars ⇒ `Expr`

#replace(pattern, value, literal: false) ⇒ `Expr`

#replace_all(pattern, value, literal: false) ⇒ `Expr`

#rjust(width, fillchar = " ") ⇒ `Expr`

#rstrip(matches = nil) ⇒ `Expr`

#slice(offset, length = nil) ⇒ `Expr`

#split(by, inclusive: false) ⇒ `Expr`

#split_exact(by, n, inclusive: false) ⇒ `Expr`

#splitn(by, n) ⇒ `Expr`

#starts_with(sub) ⇒ `Expr`

Using `starts_with` as a filter condition:

#strip(matches = nil) ⇒ `Expr`

#strptime(datatype, fmt = nil, strict: true, exact: true, cache: true, tz_aware: false) ⇒ `Expr`

#to_lowercase ⇒ `Expr`

#to_uppercase ⇒ `Expr`

#zfill(alignment) ⇒ `Expr`