Class: Polars::StringExpr

Inherits:

Object

Object
Polars::StringExpr

show all

Defined in:: lib/polars/string_expr.rb

Overview

Namespace for string related expressions.

Instance Method Summary collapse

#concat(delimiter = "-", ignore_nulls: true) ⇒ Expr
Vertically concat the values in the Series to a single string value.
#contains(pattern, literal: false, strict: true) ⇒ Expr
Check if string contains a substring that matches a regex.
#count_matches(pattern, literal: false) ⇒ Expr (also: #count_match)
Count all successive non-overlapping regex matches.
#decode(encoding, strict: true) ⇒ Expr
Decode a value using the provided encoding.
#encode(encoding) ⇒ Expr
Encode a value using the provided encoding.
#ends_with(sub) ⇒ Expr
Check if string values end with a substring.
#explode ⇒ Expr
Returns a column with a separate row for every string character.
#extract(pattern, group_index: 1) ⇒ Expr
Extract the target capture group from provided patterns.
#extract_all(pattern) ⇒ Expr
Extracts all matches for the given regex pattern.
#json_extract(dtype = nil, infer_schema_length: 100) ⇒ Expr
Parse string values as JSON.
#json_path_match(json_path) ⇒ Expr
Extract the first match of json string with provided JSONPath expression.
#lengths ⇒ Expr
Get length of the strings as :u32 (as number of bytes).
#ljust(length, fillchar = " ") ⇒ Expr (also: #pad_end)
Return the string left justified in a string of length length.
#n_chars ⇒ Expr
Get length of the strings as :u32 (as number of chars).
#parse_int(radix = 2, strict: true) ⇒ Expr
Parse integers with base radix from strings.
#replace(pattern, value, literal: false, n: 1) ⇒ Expr
Replace first matching regex/literal substring with a new string value.
#replace_all(pattern, value, literal: false) ⇒ Expr
Replace all matching regex/literal substrings with a new string value.
#rjust(length, fillchar = " ") ⇒ Expr (also: #pad_start)
Return the string right justified in a string of length length.
#slice(offset, length = nil) ⇒ Expr
Create subslices of the string values of a Utf8 Series.
#split(by, inclusive: false) ⇒ Expr
Split the string by a substring.
#split_exact(by, n, inclusive: false) ⇒ Expr
Split the string by a substring using n splits.
#splitn(by, n) ⇒ Expr
Split the string by a substring, restricted to returning at most n items.
#starts_with(sub) ⇒ Expr
Check if string values start with a substring.
#strip_chars(characters = nil) ⇒ Expr (also: #strip)
Remove leading and trailing whitespace.
#strip_chars_end(characters = nil) ⇒ Expr (also: #rstrip)
Remove trailing whitespace.
#strip_chars_start(characters = nil) ⇒ Expr (also: #lstrip)
Remove leading whitespace.
#strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false) ⇒ Expr
Parse a Utf8 expression to a Date/Datetime/Time type.
#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ Expr
Convert a Utf8 column into a Date column.
#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, use_earliest: nil, ambiguous: "raise") ⇒ Expr
Convert a Utf8 column into a Datetime column.
#to_integer(base: 10, strict: true) ⇒ Expr
Convert an Utf8 column into an Int64 column with base radix.
#to_lowercase ⇒ Expr
Transform to lowercase variant.
#to_time(format = nil, strict: true, cache: true) ⇒ Expr
Convert a Utf8 column into a Time column.
#to_uppercase ⇒ Expr
Transform to uppercase variant.
#zfill(alignment) ⇒ Expr
Fills the string with zeroes.

Instance Method Details

#concat(delimiter = "-", ignore_nulls: true) ⇒ `Expr`

Vertically concat the values in the Series to a single string value.

Examples:

df = Polars::DataFrame.new({"foo" => [1, nil, 2]})
df.select(Polars.col("foo").str.concat("-"))
# =>
# shape: (1, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ 1-2 │
# └─────┘

df = Polars::DataFrame.new({"foo" => [1, nil, 2]})
df.select(Polars.col("foo").str.concat("-", ignore_nulls: false))
# =>
# shape: (1, 1)
# ┌──────┐
# │ foo  │
# │ ---  │
# │ str  │
# ╞══════╡
# │ null │
# └──────┘

Parameters:

delimiter (String) (defaults to: "-") —
The delimiter to insert between consecutive string values.
ignore_nulls (Boolean) (defaults to: true) —
Ignore null values (default).

Returns:

(Expr)



312
313
314

# File 'lib/polars/string_expr.rb', line 312

def concat(delimiter = "-", ignore_nulls: true)
  Utils.wrap_expr(_rbexpr.str_concat(delimiter, ignore_nulls))
end

#contains(pattern, literal: false, strict: true) ⇒ `Expr`

Check if string contains a substring that matches a regex.

Examples:

df = Polars::DataFrame.new({"a" => ["Crab", "cat and dog", "rab$bit", nil]})
df.select(
  [
    Polars.col("a"),
    Polars.col("a").str.contains("cat|bit").alias("regex"),
    Polars.col("a").str.contains("rab$", literal: true).alias("literal")
  ]
)
# =>
# shape: (4, 3)
# ┌─────────────┬───────┬─────────┐
# │ a           ┆ regex ┆ literal │
# │ ---         ┆ ---   ┆ ---     │
# │ str         ┆ bool  ┆ bool    │
# ╞═════════════╪═══════╪═════════╡
# │ Crab        ┆ false ┆ false   │
# │ cat and dog ┆ true  ┆ false   │
# │ rab$bit     ┆ true  ┆ true    │
# │ null        ┆ null  ┆ null    │
# └─────────────┴───────┴─────────┘

Parameters:

pattern (String) —
A valid regex pattern.
literal (Boolean) (defaults to: false) —
Treat pattern as a literal string.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 577

def contains(pattern, literal: false, strict: true)
  pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true)._rbexpr
  Utils.wrap_expr(_rbexpr.str_contains(pattern, literal, strict))
end

#count_matches(pattern, literal: false) ⇒ `Expr` Also known as: count_match

Count all successive non-overlapping regex matches.

Examples:

df = Polars::DataFrame.new({"foo" => ["123 bla 45 asd", "xyz 678 910t"]})
df.select(
  [
    Polars.col("foo").str.count_match('\d').alias("count_digits")
  ]
)
# =>
# shape: (2, 1)
# ┌──────────────┐
# │ count_digits │
# │ ---          │
# │ u32          │
# ╞══════════════╡
# │ 5            │
# │ 6            │
# └──────────────┘

Parameters:

pattern (String) —
A valid regex pattern

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 886

def count_matches(pattern, literal: false)
  pattern = Utils.parse_as_expression(pattern, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_count_matches(pattern, literal))
end

#decode(encoding, strict: true) ⇒ `Expr`

Decode a value using the provided encoding.

Examples:

df = Polars::DataFrame.new({"encoded" => ["666f6f", "626172", nil]})
df.select(Polars.col("encoded").str.decode("hex"))
# =>
# shape: (3, 1)
# ┌───────────────┐
# │ encoded       │
# │ ---           │
# │ binary        │
# ╞═══════════════╡
# │ [binary data] │
# │ [binary data] │
# │ null          │
# └───────────────┘

Parameters:

encoding ("hex", "base64") —
The encoding to use.
strict (Boolean) (defaults to: true) —
How to handle invalid inputs:
- true: An error will be thrown if unable to decode a value.
- false: Unhandled values will be replaced with nil.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 757

def decode(encoding, strict: true)
  if encoding == "hex"
    Utils.wrap_expr(_rbexpr.str_hex_decode(strict))
  elsif encoding == "base64"
    Utils.wrap_expr(_rbexpr.str_base64_decode(strict))
  else
    raise ArgumentError, "encoding must be one of {{'hex', 'base64'}}, got #{encoding}"
  end
end

#encode(encoding) ⇒ `Expr`

Encode a value using the provided encoding.

Examples:

df = Polars::DataFrame.new({"strings" => ["foo", "bar", nil]})
df.select(Polars.col("strings").str.encode("hex"))
# =>
# shape: (3, 1)
# ┌─────────┐
# │ strings │
# │ ---     │
# │ str     │
# ╞═════════╡
# │ 666f6f  │
# │ 626172  │
# │ null    │
# └─────────┘

Parameters:

encoding ("hex", "base64") —
The encoding to use.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 788

def encode(encoding)
  if encoding == "hex"
    Utils.wrap_expr(_rbexpr.str_hex_encode)
  elsif encoding == "base64"
    Utils.wrap_expr(_rbexpr.str_base64_encode)
  else
    raise ArgumentError, "encoding must be one of {{'hex', 'base64'}}, got #{encoding}"
  end
end

#ends_with(sub) ⇒ `Expr`

Check if string values end with a substring.

Examples:

df = Polars::DataFrame.new({"fruits" => ["apple", "mango", nil]})
df.with_column(
  Polars.col("fruits").str.ends_with("go").alias("has_suffix")
)
# =>
# shape: (3, 2)
# ┌────────┬────────────┐
# │ fruits ┆ has_suffix │
# │ ---    ┆ ---        │
# │ str    ┆ bool       │
# ╞════════╪════════════╡
# │ apple  ┆ false      │
# │ mango  ┆ true       │
# │ null   ┆ null       │
# └────────┴────────────┘

Using `ends_with` as a filter condition:

df.filter(Polars.col("fruits").str.ends_with("go"))
# =>
# shape: (1, 1)
# ┌────────┐
# │ fruits │
# │ ---    │
# │ str    │
# ╞════════╡
# │ mango  │
# └────────┘

Parameters:

sub (String) —
Suffix substring.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 617

def ends_with(sub)
  sub = Utils.expr_to_lit_or_expr(sub, str_to_lit: true)._rbexpr
  Utils.wrap_expr(_rbexpr.str_ends_with(sub))
end

#explode ⇒ `Expr`

Returns a column with a separate row for every string character.

Examples:

df = Polars::DataFrame.new({"a": ["foo", "bar"]})
df.select(Polars.col("a").str.explode)
# =>
# shape: (6, 1)
# ┌─────┐
# │ a   │
# │ --- │
# │ str │
# ╞═════╡
# │ f   │
# │ o   │
# │ o   │
# │ b   │
# │ a   │
# │ r   │
# └─────┘

Returns:

(Expr)



1114
1115
1116

# File 'lib/polars/string_expr.rb', line 1114

def explode
  Utils.wrap_expr(_rbexpr.str_explode)
end

#extract(pattern, group_index: 1) ⇒ `Expr`

Extract the target capture group from provided patterns.

Examples:

df = Polars::DataFrame.new({"foo" => ["123 bla 45 asd", "xyz 678 910t"]})
df.select(
  [
    Polars.col("foo").str.extract('(\d+)')
  ]
)
# =>
# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ 123 │
# │ 678 │
# └─────┘

Parameters:

pattern (String) —
A valid regex pattern
group_index (Integer) (defaults to: 1) —
Index of the targeted capture group. Group 0 mean the whole pattern, first group begin at index 1 Default to the first capture group

Returns:

(Expr)



826
827
828

# File 'lib/polars/string_expr.rb', line 826

def extract(pattern, group_index: 1)
  Utils.wrap_expr(_rbexpr.str_extract(pattern, group_index))
end

#extract_all(pattern) ⇒ `Expr`

Extracts all matches for the given regex pattern.

Extracts each successive non-overlapping regex match in an individual string as an array.

Examples:

df = Polars::DataFrame.new({"foo" => ["123 bla 45 asd", "xyz 678 910t"]})
df.select(
  [
    Polars.col("foo").str.extract_all('(\d+)').alias("extracted_nrs")
  ]
)
# =>
# shape: (2, 1)
# ┌────────────────┐
# │ extracted_nrs  │
# │ ---            │
# │ list[str]      │
# ╞════════════════╡
# │ ["123", "45"]  │
# │ ["678", "910"] │
# └────────────────┘

Parameters:

pattern (String) —
A valid regex pattern

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 857

def extract_all(pattern)
  pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true)
  Utils.wrap_expr(_rbexpr.str_extract_all(pattern._rbexpr))
end

#json_extract(dtype = nil, infer_schema_length: 100) ⇒ `Expr`

Parse string values as JSON.

Throw errors if encounter invalid JSON strings.

Examples:

df = Polars::DataFrame.new(
  {"json" => ['{"a":1, "b": true}', nil, '{"a":2, "b": false}']}
)
dtype = Polars::Struct.new([Polars::Field.new("a", Polars::Int64), Polars::Field.new("b", Polars::Boolean)])
df.select(Polars.col("json").str.json_extract(dtype))
# =>
# shape: (3, 1)
# ┌─────────────┐
# │ json        │
# │ ---         │
# │ struct[2]   │
# ╞═════════════╡
# │ {1,true}    │
# │ {null,null} │
# │ {2,false}   │
# └─────────────┘

Parameters:

dtype (Object) (defaults to: nil) —
The dtype to cast the extracted value to. If nil, the dtype will be inferred from the JSON value.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 689

def json_extract(dtype = nil, infer_schema_length: 100)
  if !dtype.nil?
    dtype = Utils.rb_type_to_dtype(dtype)
  end
  Utils.wrap_expr(_rbexpr.str_json_extract(dtype, infer_schema_length))
end

#json_path_match(json_path) ⇒ `Expr`

Extract the first match of json string with provided JSONPath expression.

Throw errors if encounter invalid json strings. All return value will be casted to Utf8 regardless of the original value.

Documentation on JSONPath standard can be found here.

Examples:

df = Polars::DataFrame.new(
  {"json_val" => ['{"a":"1"}', nil, '{"a":2}', '{"a":2.1}', '{"a":true}']}
)
df.select(Polars.col("json_val").str.json_path_match("$.a"))
# =>
# shape: (5, 1)
# ┌──────────┐
# │ json_val │
# │ ---      │
# │ str      │
# ╞══════════╡
# │ 1        │
# │ null     │
# │ 2        │
# │ 2.1      │
# │ true     │
# └──────────┘

Parameters:

json_path (String) —
A valid JSON path query string.

Returns:

(Expr)



727
728
729

# File 'lib/polars/string_expr.rb', line 727

def json_path_match(json_path)
  Utils.wrap_expr(_rbexpr.str_json_path_match(json_path))
end

#lengths ⇒ `Expr`

Note:

The returned lengths are equal to the number of bytes in the UTF8 string. If you need the length in terms of the number of characters, use n_chars instead.

Get length of the strings as :u32 (as number of bytes).

Examples:

df = Polars::DataFrame.new({"s" => ["Café", nil, "345", "東京"]}).with_columns(
  [
    Polars.col("s").str.lengths.alias("length"),
    Polars.col("s").str.n_chars.alias("nchars")
  ]
)
df
# =>
# shape: (4, 3)
# ┌──────┬────────┬────────┐
# │ s    ┆ length ┆ nchars │
# │ ---  ┆ ---    ┆ ---    │
# │ str  ┆ u32    ┆ u32    │
# ╞══════╪════════╪════════╡
# │ Café ┆ 5      ┆ 4      │
# │ null ┆ null   ┆ null   │
# │ 345  ┆ 3      ┆ 3      │
# │ 東京 ┆ 6      ┆ 2      │
# └──────┴────────┴────────┘

Returns:

(Expr)



242
243
244

# File 'lib/polars/string_expr.rb', line 242

def lengths
  Utils.wrap_expr(_rbexpr.str_len_bytes)
end

#ljust(length, fillchar = " ") ⇒ `Expr` Also known as: pad_end

Return the string left justified in a string of length length.

Padding is done using the specified fillchar. The original string is returned if length is less than or equal to s.length.

Examples:

df = Polars::DataFrame.new({"a" => ["cow", "monkey", nil, "hippopotamus"]})
df.select(Polars.col("a").str.ljust(8, "*"))
# =>
# shape: (4, 1)
# ┌──────────────┐
# │ a            │
# │ ---          │
# │ str          │
# ╞══════════════╡
# │ cow*****     │
# │ monkey**     │
# │ null         │
# │ hippopotamus │
# └──────────────┘

Parameters:

length (Integer) —
Justify left to this length.
fillchar (String) (defaults to: " ") —
Fill with this ASCII character.

Returns:

(Expr)



509
510
511

# File 'lib/polars/string_expr.rb', line 509

def ljust(length, fillchar = " ")
  Utils.wrap_expr(_rbexpr.str_pad_end(length, fillchar))
end

#n_chars ⇒ `Expr`

Note:

If you know that you are working with ASCII text, lengths will be equivalent, and faster (returns length in terms of the number of bytes).

Get length of the strings as :u32 (as number of chars).

Examples:

df = Polars::DataFrame.new({"s" => ["Café", nil, "345", "東京"]}).with_columns(
  [
    Polars.col("s").str.lengths.alias("length"),
    Polars.col("s").str.n_chars.alias("nchars")
  ]
)
df
# =>
# shape: (4, 3)
# ┌──────┬────────┬────────┐
# │ s    ┆ length ┆ nchars │
# │ ---  ┆ ---    ┆ ---    │
# │ str  ┆ u32    ┆ u32    │
# ╞══════╪════════╪════════╡
# │ Café ┆ 5      ┆ 4      │
# │ null ┆ null   ┆ null   │
# │ 345  ┆ 3      ┆ 3      │
# │ 東京 ┆ 6      ┆ 2      │
# └──────┴────────┴────────┘

Returns:

(Expr)



274
275
276

# File 'lib/polars/string_expr.rb', line 274

def n_chars
  Utils.wrap_expr(_rbexpr.str_len_chars)
end

#parse_int(radix = 2, strict: true) ⇒ `Expr`

Parse integers with base radix from strings.

By default base 2. ParseError/Overflows become Nulls.

Examples:

df = Polars::DataFrame.new({"bin" => ["110", "101", "010", "invalid"]})
df.select(Polars.col("bin").str.parse_int(2, strict: false))
# =>
# shape: (4, 1)
# ┌──────┐
# │ bin  │
# │ ---  │
# │ i32  │
# ╞══════╡
# │ 6    │
# │ 5    │
# │ 2    │
# │ null │
# └──────┘

Parameters:

radix (Integer) (defaults to: 2) —
Positive integer which is the base of the string we are parsing. Default: 2.
strict (Boolean) (defaults to: true) —
Bool, Default=true will raise any ParseError or overflow as ComputeError. False silently convert to Null.

Returns:

(Expr)



1192
1193
1194

# File 'lib/polars/string_expr.rb', line 1192

def parse_int(radix = 2, strict: true)
  to_integer(base: 2, strict: strict).cast(Int32, strict: strict)
end

#replace(pattern, value, literal: false, n: 1) ⇒ `Expr`

Replace first matching regex/literal substring with a new string value.

Examples:

df = Polars::DataFrame.new({"id" => [1, 2], "text" => ["123abc", "abc456"]})
df.with_column(
  Polars.col("text").str.replace('abc\b', "ABC")
)
# =>
# shape: (2, 2)
# ┌─────┬────────┐
# │ id  ┆ text   │
# │ --- ┆ ---    │
# │ i64 ┆ str    │
# ╞═════╪════════╡
# │ 1   ┆ 123ABC │
# │ 2   ┆ abc456 │
# └─────┴────────┘

Parameters:

pattern (String) —
Regex pattern.
value (String) —
Replacement string.
literal (Boolean) (defaults to: false) —
Treat pattern as a literal string.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 1026

def replace(pattern, value, literal: false, n: 1)
  pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true)
  value = Utils.expr_to_lit_or_expr(value, str_to_lit: true)
  Utils.wrap_expr(_rbexpr.str_replace_n(pattern._rbexpr, value._rbexpr, literal, n))
end

#replace_all(pattern, value, literal: false) ⇒ `Expr`

Replace all matching regex/literal substrings with a new string value.

Examples:

df = Polars::DataFrame.new({"id" => [1, 2], "text" => ["abcabc", "123a123"]})
df.with_column(Polars.col("text").str.replace_all("a", "-"))
# =>
# shape: (2, 2)
# ┌─────┬─────────┐
# │ id  ┆ text    │
# │ --- ┆ ---     │
# │ i64 ┆ str     │
# ╞═════╪═════════╡
# │ 1   ┆ -bc-bc  │
# │ 2   ┆ 123-123 │
# └─────┴─────────┘

Parameters:

pattern (String) —
Regex pattern.
value (String) —
Replacement string.
literal (Boolean) (defaults to: false) —
Treat pattern as a literal string.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 1056

def replace_all(pattern, value, literal: false)
  pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true)
  value = Utils.expr_to_lit_or_expr(value, str_to_lit: true)
  Utils.wrap_expr(_rbexpr.str_replace_all(pattern._rbexpr, value._rbexpr, literal))
end

#rjust(length, fillchar = " ") ⇒ `Expr` Also known as: pad_start

Return the string right justified in a string of length length.

Padding is done using the specified fillchar. The original string is returned if length is less than or equal to s.length.

Examples:

df = Polars::DataFrame.new({"a" => ["cow", "monkey", nil, "hippopotamus"]})
df.select(Polars.col("a").str.rjust(8, "*"))
# =>
# shape: (4, 1)
# ┌──────────────┐
# │ a            │
# │ ---          │
# │ str          │
# ╞══════════════╡
# │ *****cow     │
# │ **monkey     │
# │ null         │
# │ hippopotamus │
# └──────────────┘

Parameters:

length (Integer) —
Justify right to this length.
fillchar (String) (defaults to: " ") —
Fill with this ASCII character.

Returns:

(Expr)



542
543
544

# File 'lib/polars/string_expr.rb', line 542

def rjust(length, fillchar = " ")
  Utils.wrap_expr(_rbexpr.str_pad_start(length, fillchar))
end

#slice(offset, length = nil) ⇒ `Expr`

Create subslices of the string values of a Utf8 Series.

Examples:

df = Polars::DataFrame.new({"s" => ["pear", nil, "papaya", "dragonfruit"]})
df.with_column(
  Polars.col("s").str.slice(-3).alias("s_sliced")
)
# =>
# shape: (4, 2)
# ┌─────────────┬──────────┐
# │ s           ┆ s_sliced │
# │ ---         ┆ ---      │
# │ str         ┆ str      │
# ╞═════════════╪══════════╡
# │ pear        ┆ ear      │
# │ null        ┆ null     │
# │ papaya      ┆ aya      │
# │ dragonfruit ┆ uit      │
# └─────────────┴──────────┘

Parameters:

offset (Integer) —
Start index. Negative indexing is supported.
length (Integer) (defaults to: nil) —
Length of the slice. If set to nil (default), the slice is taken to the end of the string.

Returns:

(Expr)



1089
1090
1091

# File 'lib/polars/string_expr.rb', line 1089

def slice(offset, length = nil)
  Utils.wrap_expr(_rbexpr.str_slice(offset, length))
end

#split(by, inclusive: false) ⇒ `Expr`

Split the string by a substring.

Examples:

df = Polars::DataFrame.new({"s" => ["foo bar", "foo-bar", "foo bar baz"]})
df.select(Polars.col("s").str.split(" "))
# =>
# shape: (3, 1)
# ┌───────────────────────┐
# │ s                     │
# │ ---                   │
# │ list[str]             │
# ╞═══════════════════════╡
# │ ["foo", "bar"]        │
# │ ["foo-bar"]           │
# │ ["foo", "bar", "baz"] │
# └───────────────────────┘

Parameters:

by (String) —
Substring to split by.
inclusive (Boolean) (defaults to: false) —
If true, include the split character/string in the results.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 915

def split(by, inclusive: false)
  by = Utils.parse_as_expression(by, str_as_lit: true)
  if inclusive
    Utils.wrap_expr(_rbexpr.str_split_inclusive(by))
  else
    Utils.wrap_expr(_rbexpr.str_split(by))
  end
end

#split_exact(by, n, inclusive: false) ⇒ `Expr`

Split the string by a substring using n splits.

Results in a struct of n+1 fields.

If it cannot make n splits, the remaining field elements will be null.

Examples:

df = Polars::DataFrame.new({"x" => ["a_1", nil, "c", "d_4"]})
df.select(
  [
    Polars.col("x").str.split_exact("_", 1).alias("fields")
  ]
)
# =>
# shape: (4, 1)
# ┌─────────────┐
# │ fields      │
# │ ---         │
# │ struct[2]   │
# ╞═════════════╡
# │ {"a","1"}   │
# │ {null,null} │
# │ {"c",null}  │
# │ {"d","4"}   │
# └─────────────┘

Parameters:

by (String) —
Substring to split by.
n (Integer) —
Number of splits to make.
inclusive (Boolean) (defaults to: false) —
If true, include the split character/string in the results.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 958

def split_exact(by, n, inclusive: false)
  by = Utils.parse_as_expression(by, str_as_lit: true)
  if inclusive
    Utils.wrap_expr(_rbexpr.str_split_exact_inclusive(by, n))
  else
    Utils.wrap_expr(_rbexpr.str_split_exact(by, n))
  end
end

#splitn(by, n) ⇒ `Expr`

Split the string by a substring, restricted to returning at most n items.

If the number of possible splits is less than n-1, the remaining field elements will be null. If the number of possible splits is n-1 or greater, the last (nth) substring will contain the remainder of the string.

Examples:

df = Polars::DataFrame.new({"s" => ["foo bar", nil, "foo-bar", "foo bar baz"]})
df.select(Polars.col("s").str.splitn(" ", 2).alias("fields"))
# =>
# shape: (4, 1)
# ┌───────────────────┐
# │ fields            │
# │ ---               │
# │ struct[2]         │
# ╞═══════════════════╡
# │ {"foo","bar"}     │
# │ {null,null}       │
# │ {"foo-bar",null}  │
# │ {"foo","bar baz"} │
# └───────────────────┘

Parameters:

by (String) —
Substring to split by.
n (Integer) —
Max number of items to return.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 995

def splitn(by, n)
  by = Utils.parse_as_expression(by, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_splitn(by, n))
end

#starts_with(sub) ⇒ `Expr`

Check if string values start with a substring.

Examples:

df = Polars::DataFrame.new({"fruits" => ["apple", "mango", nil]})
df.with_column(
  Polars.col("fruits").str.starts_with("app").alias("has_prefix")
)
# =>
# shape: (3, 2)
# ┌────────┬────────────┐
# │ fruits ┆ has_prefix │
# │ ---    ┆ ---        │
# │ str    ┆ bool       │
# ╞════════╪════════════╡
# │ apple  ┆ true       │
# │ mango  ┆ false      │
# │ null   ┆ null       │
# └────────┴────────────┘

Using `starts_with` as a filter condition:

df.filter(Polars.col("fruits").str.starts_with("app"))
# =>
# shape: (1, 1)
# ┌────────┐
# │ fruits │
# │ ---    │
# │ str    │
# ╞════════╡
# │ apple  │
# └────────┘

Parameters:

sub (String) —
Prefix substring.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 657

def starts_with(sub)
  sub = Utils.expr_to_lit_or_expr(sub, str_to_lit: true)._rbexpr
  Utils.wrap_expr(_rbexpr.str_starts_with(sub))
end

#strip_chars(characters = nil) ⇒ `Expr` Also known as: strip

Remove leading and trailing whitespace.

Examples:

df = Polars::DataFrame.new({"foo" => [" lead", "trail ", " both "]})
df.select(Polars.col("foo").str.strip)
# =>
# shape: (3, 1)
# ┌───────┐
# │ foo   │
# │ ---   │
# │ str   │
# ╞═══════╡
# │ lead  │
# │ trail │
# │ both  │
# └───────┘

Parameters:

characters (String, nil) (defaults to: nil) —
An optional single character that should be trimmed.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 379

def strip_chars(characters = nil)
  characters = Utils.parse_as_expression(characters, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_strip_chars(characters))
end

#strip_chars_end(characters = nil) ⇒ `Expr` Also known as: rstrip

Remove trailing whitespace.

Examples:

df = Polars::DataFrame.new({"foo" => [" lead", "trail ", " both "]})
df.select(Polars.col("foo").str.rstrip)
# =>
# shape: (3, 1)
# ┌───────┐
# │ foo   │
# │ ---   │
# │ str   │
# ╞═══════╡
# │  lead │
# │ trail │
# │  both │
# └───────┘

Parameters:

characters (String, nil) (defaults to: nil) —
An optional single character that should be trimmed.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 433

def strip_chars_end(characters = nil)
  characters = Utils.parse_as_expression(characters, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_strip_chars_end(characters))
end

#strip_chars_start(characters = nil) ⇒ `Expr` Also known as: lstrip

Remove leading whitespace.

Examples:

df = Polars::DataFrame.new({"foo" => [" lead", "trail ", " both "]})
df.select(Polars.col("foo").str.lstrip)
# =>
# shape: (3, 1)
# ┌────────┐
# │ foo    │
# │ ---    │
# │ str    │
# ╞════════╡
# │ lead   │
# │ trail  │
# │ both   │
# └────────┘

Parameters:

characters (String, nil) (defaults to: nil) —
An optional single character that should be trimmed.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 406

def strip_chars_start(characters = nil)
  characters = Utils.parse_as_expression(characters, str_as_lit: true)
  Utils.wrap_expr(_rbexpr.str_strip_chars_start(characters))
end

#strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false) ⇒ `Expr`

Note:

When parsing a Datetime the column precision will be inferred from the format string, if given, eg: "%F %T%.3f" => Datetime("ms"). If no fractional second component is found then the default is "us".

Parse a Utf8 expression to a Date/Datetime/Time type.

Examples:

Dealing with a consistent format:

s = Polars::Series.new(["2020-01-01 01:00Z", "2020-01-01 02:00Z"])
s.str.strptime(Polars::Datetime, "%Y-%m-%d %H:%M%#z")
# =>
# shape: (2,)
# Series: '' [datetime[μs, UTC]]
# [
#         2020-01-01 01:00:00 UTC
#         2020-01-01 02:00:00 UTC
# ]

Dealing with different formats.

s = Polars::Series.new(
  "date",
  [
    "2021-04-22",
    "2022-01-04 00:00:00",
    "01/31/22",
    "Sun Jul  8 00:34:60 2001",
  ]
)
s.to_frame.select(
  Polars.coalesce(
    Polars.col("date").str.strptime(Polars::Date, "%F", strict: false),
    Polars.col("date").str.strptime(Polars::Date, "%F %T", strict: false),
    Polars.col("date").str.strptime(Polars::Date, "%D", strict: false),
    Polars.col("date").str.strptime(Polars::Date, "%c", strict: false)
  )
).to_series
# =>
# shape: (4,)
# Series: 'date' [date]
# [
#         2021-04-22
#         2022-01-04
#         2022-01-31
#         2001-07-08
# ]

Parameters:

dtype (Object) —
The data type to convert into. Can be either Date, Datetime, or Time.
format (String) (defaults to: nil) —
Format to use, refer to the chrono strftime documentation for specification. Example: "%y-%m-%d".
strict (Boolean) (defaults to: true) —
Raise an error if any conversion fails.
exact (Boolean) (defaults to: true) —
- If true, require an exact format match.
- If false, allow the format to match anywhere in the target string.
utc (Boolean) (defaults to: false) —
Parse timezone aware datetimes as UTC. This may be useful if you have data with mixed offsets.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 197

def strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false)
  _validate_format_argument(format)

  if dtype == Date
    to_date(format, strict: strict, exact: exact, cache: cache)
  elsif dtype == Datetime || dtype.is_a?(Datetime)
    dtype = Datetime.new if dtype == Datetime
    time_unit = dtype.time_unit
    time_zone = dtype.time_zone
    to_datetime(format, time_unit: time_unit, time_zone: time_zone, strict: strict, exact: exact, cache: cache)
  elsif dtype == Time
    to_time(format, strict: strict, cache: cache)
  else
    raise ArgumentError, "dtype should be of type {Date, Datetime, Time}"
  end
end

#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ `Expr`

Convert a Utf8 column into a Date column.

Examples:

s = Polars::Series.new(["2020/01/01", "2020/02/01", "2020/03/01"])
s.str.to_date
# =>
# shape: (3,)
# Series: '' [date]
# [
#         2020-01-01
#         2020-02-01
#         2020-03-01
# ]

Parameters:

format (String) (defaults to: nil) —
Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%Y-%m-%d". If set to nil (default), the format is inferred from the data.
strict (Boolean) (defaults to: true) —
Raise an error if any conversion fails.
exact (Boolean) (defaults to: true) —
Require an exact format match. If false, allow the format to match anywhere in the target string.
cache (Boolean) (defaults to: true) —
Use a cache of unique, converted dates to apply the conversion.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 40

def to_date(format = nil, strict: true, exact: true, cache: true)
  _validate_format_argument(format)
  Utils.wrap_expr(self._rbexpr.str_to_date(format, strict, exact, cache))
end

#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, use_earliest: nil, ambiguous: "raise") ⇒ `Expr`

Convert a Utf8 column into a Datetime column.

Examples:

s = Polars::Series.new(["2020-01-01 01:00Z", "2020-01-01 02:00Z"])
s.str.to_datetime("%Y-%m-%d %H:%M%#z")
# =>
# shape: (2,)
# Series: '' [datetime[μs, UTC]]
# [
#         2020-01-01 01:00:00 UTC
#         2020-01-01 02:00:00 UTC
# ]

Parameters:

format (String) (defaults to: nil) —
Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%Y-%m-%d %H:%M:%S". If set to nil (default), the format is inferred from the data.
time_unit ("us", "ns", "ms") (defaults to: nil) —
Unit of time for the resulting Datetime column. If set to nil (default), the time unit is inferred from the format string if given, eg: "%F %T%.3f" => Datetime("ms"). If no fractional second component is found, the default is "us".
time_zone (String) (defaults to: nil) —
Time zone for the resulting Datetime column.
strict (Boolean) (defaults to: true) —
Raise an error if any conversion fails.
exact (Boolean) (defaults to: true) —
Require an exact format match. If false, allow the format to match anywhere in the target string.
cache (Boolean) (defaults to: true) —
Use a cache of unique, converted datetimes to apply the conversion.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 79

def to_datetime(
  format = nil,
  time_unit: nil,
  time_zone: nil,
  strict: true,
  exact: true,
  cache: true,
  use_earliest: nil,
  ambiguous: "raise"
)
  _validate_format_argument(format)
  ambiguous = Utils.rename_use_earliest_to_ambiguous(use_earliest, ambiguous)
  ambiguous = Polars.lit(ambiguous) unless ambiguous.is_a?(Expr)
  Utils.wrap_expr(
    self._rbexpr.str_to_datetime(
      format,
      time_unit,
      time_zone,
      strict,
      exact,
      cache,
      ambiguous._rbexpr
    )
  )
end

#to_integer(base: 10, strict: true) ⇒ `Expr`

Convert an Utf8 column into an Int64 column with base radix.

Examples:

df = Polars::DataFrame.new({"bin" => ["110", "101", "010", "invalid"]})
df.with_columns(Polars.col("bin").str.to_integer(base: 2, strict: false).alias("parsed"))
# =>
# shape: (4, 2)
# ┌─────────┬────────┐
# │ bin     ┆ parsed │
# │ ---     ┆ ---    │
# │ str     ┆ i64    │
# ╞═════════╪════════╡
# │ 110     ┆ 6      │
# │ 101     ┆ 5      │
# │ 010     ┆ 2      │
# │ invalid ┆ null   │
# └─────────┴────────┘

df = Polars::DataFrame.new({"hex" => ["fa1e", "ff00", "cafe", nil]})
df.with_columns(Polars.col("hex").str.to_integer(base: 16, strict: true).alias("parsed"))
# =>
# shape: (4, 2)
# ┌──────┬────────┐
# │ hex  ┆ parsed │
# │ ---  ┆ ---    │
# │ str  ┆ i64    │
# ╞══════╪════════╡
# │ fa1e ┆ 64030  │
# │ ff00 ┆ 65280  │
# │ cafe ┆ 51966  │
# │ null ┆ null   │
# └──────┴────────┘

Parameters:

base (Integer) (defaults to: 10) —
Positive integer which is the base of the string we are parsing. Default: 10.
strict (Boolean) (defaults to: true) —
Bool, default=true will raise any ParseError or overflow as ComputeError. false silently convert to Null.

Returns:

(Expr)



1160
1161
1162

# File 'lib/polars/string_expr.rb', line 1160

def to_integer(base: 10, strict: true)
  Utils.wrap_expr(_rbexpr.str_to_integer(base, strict))
end

#to_lowercase ⇒ `Expr`

Transform to lowercase variant.

Examples:

df = Polars::DataFrame.new({"foo" => ["CAT", "DOG"]})
df.select(Polars.col("foo").str.to_lowercase)
# =>
# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ cat │
# │ dog │
# └─────┘

Returns:

(Expr)



354
355
356

# File 'lib/polars/string_expr.rb', line 354

def to_lowercase
  Utils.wrap_expr(_rbexpr.str_to_lowercase)
end

#to_time(format = nil, strict: true, cache: true) ⇒ `Expr`

Convert a Utf8 column into a Time column.

Examples:

s = Polars::Series.new(["01:00", "02:00", "03:00"])
s.str.to_time("%H:%M")
# =>
# shape: (3,)
# Series: '' [time]
# [
#         01:00:00
#         02:00:00
#         03:00:00
# ]

Parameters:

format (String) (defaults to: nil) —
Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%H:%M:%S". If set to nil (default), the format is inferred from the data.
strict (Boolean) (defaults to: true) —
Raise an error if any conversion fails.
cache (Boolean) (defaults to: true) —
Use a cache of unique, converted times to apply the conversion.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 130

def to_time(format = nil, strict: true, cache: true)
  _validate_format_argument(format)
  Utils.wrap_expr(_rbexpr.str_to_time(format, strict, cache))
end

#to_uppercase ⇒ `Expr`

Transform to uppercase variant.

Examples:

df = Polars::DataFrame.new({"foo" => ["cat", "dog"]})
df.select(Polars.col("foo").str.to_uppercase)
# =>
# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ CAT │
# │ DOG │
# └─────┘

Returns:

(Expr)



333
334
335

# File 'lib/polars/string_expr.rb', line 333

def to_uppercase
  Utils.wrap_expr(_rbexpr.str_to_uppercase)
end

#zfill(alignment) ⇒ `Expr`

Fills the string with zeroes.

Return a copy of the string left filled with ASCII '0' digits to make a string of length width.

A leading sign prefix ('+'/'-') is handled by inserting the padding after the sign character rather than before. The original string is returned if width is less than or equal to s.length.

Examples:

df = Polars::DataFrame.new(
  {
    "num" => [-10, -1, 0, 1, 10, 100, 1000, 10000, 100000, 1000000, nil]
  }
)
df.with_column(Polars.col("num").cast(String).str.zfill(5))
# =>
# shape: (11, 1)
# ┌─────────┐
# │ num     │
# │ ---     │
# │ str     │
# ╞═════════╡
# │ -0010   │
# │ -0001   │
# │ 00000   │
# │ 00001   │
# │ …       │
# │ 10000   │
# │ 100000  │
# │ 1000000 │
# │ null    │
# └─────────┘

Parameters:

alignment (Integer) —
Fill the value up to this length

Returns:

(Expr)



477
478
479

# File 'lib/polars/string_expr.rb', line 477

def zfill(alignment)
  Utils.wrap_expr(_rbexpr.str_zfill(alignment))
end

Class: Polars::StringExpr

Overview

Instance Method Summary collapse

Instance Method Details

#concat(delimiter = "-", ignore_nulls: true) ⇒ Expr

Examples:

#contains(pattern, literal: false, strict: true) ⇒ Expr

Examples:

#count_matches(pattern, literal: false) ⇒ Expr Also known as: count_match

Examples:

#decode(encoding, strict: true) ⇒ Expr

Examples:

#encode(encoding) ⇒ Expr

Examples:

#ends_with(sub) ⇒ Expr

Examples:

Using ends_with as a filter condition:

#explode ⇒ Expr

Examples:

#extract(pattern, group_index: 1) ⇒ Expr

Examples:

#extract_all(pattern) ⇒ Expr

Examples:

#json_extract(dtype = nil, infer_schema_length: 100) ⇒ Expr

Examples:

#json_path_match(json_path) ⇒ Expr

Examples:

#lengths ⇒ Expr

Examples:

#ljust(length, fillchar = " ") ⇒ Expr Also known as: pad_end

Examples:

#n_chars ⇒ Expr

Examples:

#parse_int(radix = 2, strict: true) ⇒ Expr

Examples:

#replace(pattern, value, literal: false, n: 1) ⇒ Expr

Examples:

#replace_all(pattern, value, literal: false) ⇒ Expr

Examples:

#rjust(length, fillchar = " ") ⇒ Expr Also known as: pad_start

Examples:

#slice(offset, length = nil) ⇒ Expr

Examples:

#split(by, inclusive: false) ⇒ Expr

Examples:

#split_exact(by, n, inclusive: false) ⇒ Expr

Examples:

#splitn(by, n) ⇒ Expr

Examples:

#starts_with(sub) ⇒ Expr

Examples:

Using starts_with as a filter condition:

#strip_chars(characters = nil) ⇒ Expr Also known as: strip

Examples:

#strip_chars_end(characters = nil) ⇒ Expr Also known as: rstrip

Examples:

#strip_chars_start(characters = nil) ⇒ Expr Also known as: lstrip

Examples:

#strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false) ⇒ Expr

Examples:

Dealing with a consistent format:

Dealing with different formats.

#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ Expr

Examples:

#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, use_earliest: nil, ambiguous: "raise") ⇒ Expr

Examples:

#to_integer(base: 10, strict: true) ⇒ Expr

Examples:

#to_lowercase ⇒ Expr

Examples:

#to_time(format = nil, strict: true, cache: true) ⇒ Expr

Examples:

#to_uppercase ⇒ Expr

Examples:

#zfill(alignment) ⇒ Expr

Examples:

#concat(delimiter = "-", ignore_nulls: true) ⇒ `Expr`

#contains(pattern, literal: false, strict: true) ⇒ `Expr`

#count_matches(pattern, literal: false) ⇒ `Expr` Also known as: count_match

#decode(encoding, strict: true) ⇒ `Expr`

#encode(encoding) ⇒ `Expr`

#ends_with(sub) ⇒ `Expr`

Using `ends_with` as a filter condition:

#explode ⇒ `Expr`

#extract(pattern, group_index: 1) ⇒ `Expr`

#extract_all(pattern) ⇒ `Expr`

#json_extract(dtype = nil, infer_schema_length: 100) ⇒ `Expr`

#json_path_match(json_path) ⇒ `Expr`

#lengths ⇒ `Expr`

#ljust(length, fillchar = " ") ⇒ `Expr` Also known as: pad_end

#n_chars ⇒ `Expr`

#parse_int(radix = 2, strict: true) ⇒ `Expr`

#replace(pattern, value, literal: false, n: 1) ⇒ `Expr`

#replace_all(pattern, value, literal: false) ⇒ `Expr`

#rjust(length, fillchar = " ") ⇒ `Expr` Also known as: pad_start

#slice(offset, length = nil) ⇒ `Expr`

#split(by, inclusive: false) ⇒ `Expr`

#split_exact(by, n, inclusive: false) ⇒ `Expr`

#splitn(by, n) ⇒ `Expr`

#starts_with(sub) ⇒ `Expr`

Using `starts_with` as a filter condition:

#strip_chars(characters = nil) ⇒ `Expr` Also known as: strip

#strip_chars_end(characters = nil) ⇒ `Expr` Also known as: rstrip

#strip_chars_start(characters = nil) ⇒ `Expr` Also known as: lstrip

#strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false) ⇒ `Expr`

#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ `Expr`

#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, use_earliest: nil, ambiguous: "raise") ⇒ `Expr`

#to_integer(base: 10, strict: true) ⇒ `Expr`

#to_lowercase ⇒ `Expr`

#to_time(format = nil, strict: true, cache: true) ⇒ `Expr`

#to_uppercase ⇒ `Expr`

#zfill(alignment) ⇒ `Expr`