Class: Polars::StringExpr

Inherits:

Object

Object
Polars::StringExpr

show all

Defined in:: lib/polars/string_expr.rb

Overview

Namespace for string related expressions.

Instance Method Summary collapse

#concat(delimiter = "-") ⇒ Expr
Vertically concat the values in the Series to a single string value.
#contains(pattern, literal: false, strict: true) ⇒ Expr
Check if string contains a substring that matches a regex.
#count_match(pattern) ⇒ Expr
Count all successive non-overlapping regex matches.
#decode(encoding, strict: true) ⇒ Expr
Decode a value using the provided encoding.
#encode(encoding) ⇒ Expr
Encode a value using the provided encoding.
#ends_with(sub) ⇒ Expr
Check if string values end with a substring.
#explode ⇒ Expr
Returns a column with a separate row for every string character.
#extract(pattern, group_index: 1) ⇒ Expr
Extract the target capture group from provided patterns.
#extract_all(pattern) ⇒ Expr
Extracts all matches for the given regex pattern.
#json_extract(dtype = nil, infer_schema_length: 100) ⇒ Expr
Parse string values as JSON.
#json_path_match(json_path) ⇒ Expr
Extract the first match of json string with provided JSONPath expression.
#lengths ⇒ Expr
Get length of the strings as :u32 (as number of bytes).
#ljust(width, fillchar = " ") ⇒ Expr
Return the string left justified in a string of length width.
#lstrip(matches = nil) ⇒ Expr
Remove leading whitespace.
#n_chars ⇒ Expr
Get length of the strings as :u32 (as number of chars).
#parse_int(radix = 2, strict: true) ⇒ Expr
Parse integers with base radix from strings.
#replace(pattern, value, literal: false, n: 1) ⇒ Expr
Replace first matching regex/literal substring with a new string value.
#replace_all(pattern, value, literal: false) ⇒ Expr
Replace all matching regex/literal substrings with a new string value.
#rjust(width, fillchar = " ") ⇒ Expr
Return the string right justified in a string of length width.
#rstrip(matches = nil) ⇒ Expr
Remove trailing whitespace.
#slice(offset, length = nil) ⇒ Expr
Create subslices of the string values of a Utf8 Series.
#split(by, inclusive: false) ⇒ Expr
Split the string by a substring.
#split_exact(by, n, inclusive: false) ⇒ Expr
Split the string by a substring using n splits.
#splitn(by, n) ⇒ Expr
Split the string by a substring, restricted to returning at most n items.
#starts_with(sub) ⇒ Expr
Check if string values start with a substring.
#strip(matches = nil) ⇒ Expr
Remove leading and trailing whitespace.
#strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false) ⇒ Expr
Parse a Utf8 expression to a Date/Datetime/Time type.
#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ Expr
Convert a Utf8 column into a Date column.
#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true) ⇒ Expr
Convert a Utf8 column into a Datetime column.
#to_lowercase ⇒ Expr
Transform to lowercase variant.
#to_time(format = nil, strict: true, cache: true) ⇒ Expr
Convert a Utf8 column into a Time column.
#to_uppercase ⇒ Expr
Transform to uppercase variant.
#zfill(alignment) ⇒ Expr
Fills the string with zeroes.

Instance Method Details

#concat(delimiter = "-") ⇒ `Expr`

Vertically concat the values in the Series to a single string value.

Examples:

df = Polars::DataFrame.new({"foo" => [1, nil, 2]})
df.select(Polars.col("foo").str.concat("-"))
# =>
# shape: (1, 1)
# ┌──────────┐
# │ foo      │
# │ ---      │
# │ str      │
# ╞══════════╡
# │ 1-null-2 │
# └──────────┘

Parameters:

delimiter (String) (defaults to: "-") —
The delimiter to insert between consecutive string values.

Returns:

(Expr)



292
293
294

# File 'lib/polars/string_expr.rb', line 292

def concat(delimiter = "-")
  Utils.wrap_expr(_rbexpr.str_concat(delimiter))
end

#contains(pattern, literal: false, strict: true) ⇒ `Expr`

Check if string contains a substring that matches a regex.

Examples:

df = Polars::DataFrame.new({"a" => ["Crab", "cat and dog", "rab$bit", nil]})
df.select(
  [
    Polars.col("a"),
    Polars.col("a").str.contains("cat|bit").alias("regex"),
    Polars.col("a").str.contains("rab$", literal: true).alias("literal")
  ]
)
# =>
# shape: (4, 3)
# ┌─────────────┬───────┬─────────┐
# │ a           ┆ regex ┆ literal │
# │ ---         ┆ ---   ┆ ---     │
# │ str         ┆ bool  ┆ bool    │
# ╞═════════════╪═══════╪═════════╡
# │ Crab        ┆ false ┆ false   │
# │ cat and dog ┆ true  ┆ false   │
# │ rab$bit     ┆ true  ┆ true    │
# │ null        ┆ null  ┆ null    │
# └─────────────┴───────┴─────────┘

Parameters:

pattern (String) —
A valid regex pattern.
literal (Boolean) (defaults to: false) —
Treat pattern as a literal string.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 558

def contains(pattern, literal: false, strict: true)
  pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true)._rbexpr
  Utils.wrap_expr(_rbexpr.str_contains(pattern, literal, strict))
end

#count_match(pattern) ⇒ `Expr`

Count all successive non-overlapping regex matches.

Examples:

df = Polars::DataFrame.new({"foo" => ["123 bla 45 asd", "xyz 678 910t"]})
df.select(
  [
    Polars.col("foo").str.count_match('\d').alias("count_digits")
  ]
)
# =>
# shape: (2, 1)
# ┌──────────────┐
# │ count_digits │
# │ ---          │
# │ u32          │
# ╞══════════════╡
# │ 5            │
# │ 6            │
# └──────────────┘

Parameters:

pattern (String) —
A valid regex pattern

Returns:

(Expr)



867
868
869

# File 'lib/polars/string_expr.rb', line 867

def count_match(pattern)
  Utils.wrap_expr(_rbexpr.count_match(pattern))
end

#decode(encoding, strict: true) ⇒ `Expr`

Decode a value using the provided encoding.

Examples:

df = Polars::DataFrame.new({"encoded" => ["666f6f", "626172", nil]})
df.select(Polars.col("encoded").str.decode("hex"))
# =>
# shape: (3, 1)
# ┌───────────────┐
# │ encoded       │
# │ ---           │
# │ binary        │
# ╞═══════════════╡
# │ [binary data] │
# │ [binary data] │
# │ null          │
# └───────────────┘

Parameters:

encoding ("hex", "base64") —
The encoding to use.
strict (Boolean) (defaults to: true) —
How to handle invalid inputs:
- true: An error will be thrown if unable to decode a value.
- false: Unhandled values will be replaced with nil.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 738

def decode(encoding, strict: true)
  if encoding == "hex"
    Utils.wrap_expr(_rbexpr.str_hex_decode(strict))
  elsif encoding == "base64"
    Utils.wrap_expr(_rbexpr.str_base64_decode(strict))
  else
    raise ArgumentError, "encoding must be one of {{'hex', 'base64'}}, got #{encoding}"
  end
end

#encode(encoding) ⇒ `Expr`

Encode a value using the provided encoding.

Examples:

df = Polars::DataFrame.new({"strings" => ["foo", "bar", nil]})
df.select(Polars.col("strings").str.encode("hex"))
# =>
# shape: (3, 1)
# ┌─────────┐
# │ strings │
# │ ---     │
# │ str     │
# ╞═════════╡
# │ 666f6f  │
# │ 626172  │
# │ null    │
# └─────────┘

Parameters:

encoding ("hex", "base64") —
The encoding to use.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 769

def encode(encoding)
  if encoding == "hex"
    Utils.wrap_expr(_rbexpr.str_hex_encode)
  elsif encoding == "base64"
    Utils.wrap_expr(_rbexpr.str_base64_encode)
  else
    raise ArgumentError, "encoding must be one of {{'hex', 'base64'}}, got #{encoding}"
  end
end

#ends_with(sub) ⇒ `Expr`

Check if string values end with a substring.

Examples:

df = Polars::DataFrame.new({"fruits" => ["apple", "mango", nil]})
df.with_column(
  Polars.col("fruits").str.ends_with("go").alias("has_suffix")
)
# =>
# shape: (3, 2)
# ┌────────┬────────────┐
# │ fruits ┆ has_suffix │
# │ ---    ┆ ---        │
# │ str    ┆ bool       │
# ╞════════╪════════════╡
# │ apple  ┆ false      │
# │ mango  ┆ true       │
# │ null   ┆ null       │
# └────────┴────────────┘

Using `ends_with` as a filter condition:

df.filter(Polars.col("fruits").str.ends_with("go"))
# =>
# shape: (1, 1)
# ┌────────┐
# │ fruits │
# │ ---    │
# │ str    │
# ╞════════╡
# │ mango  │
# └────────┘

Parameters:

sub (String) —
Suffix substring.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 598

def ends_with(sub)
  sub = Utils.expr_to_lit_or_expr(sub, str_to_lit: true)._rbexpr
  Utils.wrap_expr(_rbexpr.str_ends_with(sub))
end

#explode ⇒ `Expr`

Returns a column with a separate row for every string character.

Examples:

df = Polars::DataFrame.new({"a": ["foo", "bar"]})
df.select(Polars.col("a").str.explode)
# =>
# shape: (6, 1)
# ┌─────┐
# │ a   │
# │ --- │
# │ str │
# ╞═════╡
# │ f   │
# │ o   │
# │ o   │
# │ b   │
# │ a   │
# │ r   │
# └─────┘

Returns:

(Expr)



1090
1091
1092

# File 'lib/polars/string_expr.rb', line 1090

def explode
  Utils.wrap_expr(_rbexpr.str_explode)
end

#extract(pattern, group_index: 1) ⇒ `Expr`

Extract the target capture group from provided patterns.

Examples:

df = Polars::DataFrame.new({"foo" => ["123 bla 45 asd", "xyz 678 910t"]})
df.select(
  [
    Polars.col("foo").str.extract('(\d+)')
  ]
)
# =>
# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ 123 │
# │ 678 │
# └─────┘

Parameters:

pattern (String) —
A valid regex pattern
group_index (Integer) (defaults to: 1) —
Index of the targeted capture group. Group 0 mean the whole pattern, first group begin at index 1 Default to the first capture group

Returns:

(Expr)



807
808
809

# File 'lib/polars/string_expr.rb', line 807

def extract(pattern, group_index: 1)
  Utils.wrap_expr(_rbexpr.str_extract(pattern, group_index))
end

#extract_all(pattern) ⇒ `Expr`

Extracts all matches for the given regex pattern.

Extracts each successive non-overlapping regex match in an individual string as an array.

Examples:

df = Polars::DataFrame.new({"foo" => ["123 bla 45 asd", "xyz 678 910t"]})
df.select(
  [
    Polars.col("foo").str.extract_all('(\d+)').alias("extracted_nrs")
  ]
)
# =>
# shape: (2, 1)
# ┌────────────────┐
# │ extracted_nrs  │
# │ ---            │
# │ list[str]      │
# ╞════════════════╡
# │ ["123", "45"]  │
# │ ["678", "910"] │
# └────────────────┘

Parameters:

pattern (String) —
A valid regex pattern

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 838

def extract_all(pattern)
  pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true)
  Utils.wrap_expr(_rbexpr.str_extract_all(pattern._rbexpr))
end

#json_extract(dtype = nil, infer_schema_length: 100) ⇒ `Expr`

Parse string values as JSON.

Throw errors if encounter invalid JSON strings.

Examples:

df = Polars::DataFrame.new(
  {"json" => ['{"a":1, "b": true}', nil, '{"a":2, "b": false}']}
)
dtype = Polars::Struct.new([Polars::Field.new("a", Polars::Int64), Polars::Field.new("b", Polars::Boolean)])
df.select(Polars.col("json").str.json_extract(dtype))
# =>
# shape: (3, 1)
# ┌─────────────┐
# │ json        │
# │ ---         │
# │ struct[2]   │
# ╞═════════════╡
# │ {1,true}    │
# │ {null,null} │
# │ {2,false}   │
# └─────────────┘

Parameters:

dtype (Object) (defaults to: nil) —
The dtype to cast the extracted value to. If nil, the dtype will be inferred from the JSON value.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 670

def json_extract(dtype = nil, infer_schema_length: 100)
  if !dtype.nil?
    dtype = Utils.rb_type_to_dtype(dtype)
  end
  Utils.wrap_expr(_rbexpr.str_json_extract(dtype, infer_schema_length))
end

#json_path_match(json_path) ⇒ `Expr`

Extract the first match of json string with provided JSONPath expression.

Throw errors if encounter invalid json strings. All return value will be casted to Utf8 regardless of the original value.

Documentation on JSONPath standard can be found here.

Examples:

df = Polars::DataFrame.new(
  {"json_val" => ['{"a":"1"}', nil, '{"a":2}', '{"a":2.1}', '{"a":true}']}
)
df.select(Polars.col("json_val").str.json_path_match("$.a"))
# =>
# shape: (5, 1)
# ┌──────────┐
# │ json_val │
# │ ---      │
# │ str      │
# ╞══════════╡
# │ 1        │
# │ null     │
# │ 2        │
# │ 2.1      │
# │ true     │
# └──────────┘

Parameters:

json_path (String) —
A valid JSON path query string.

Returns:

(Expr)



708
709
710

# File 'lib/polars/string_expr.rb', line 708

def json_path_match(json_path)
  Utils.wrap_expr(_rbexpr.str_json_path_match(json_path))
end

#lengths ⇒ `Expr`

Note:

The returned lengths are equal to the number of bytes in the UTF8 string. If you need the length in terms of the number of characters, use n_chars instead.

Get length of the strings as :u32 (as number of bytes).

Examples:

df = Polars::DataFrame.new({"s" => ["Café", nil, "345", "東京"]}).with_columns(
  [
    Polars.col("s").str.lengths.alias("length"),
    Polars.col("s").str.n_chars.alias("nchars")
  ]
)
df
# =>
# shape: (4, 3)
# ┌──────┬────────┬────────┐
# │ s    ┆ length ┆ nchars │
# │ ---  ┆ ---    ┆ ---    │
# │ str  ┆ u32    ┆ u32    │
# ╞══════╪════════╪════════╡
# │ Café ┆ 5      ┆ 4      │
# │ null ┆ null   ┆ null   │
# │ 345  ┆ 3      ┆ 3      │
# │ 東京 ┆ 6      ┆ 2      │
# └──────┴────────┴────────┘

Returns:

(Expr)



237
238
239

# File 'lib/polars/string_expr.rb', line 237

def lengths
  Utils.wrap_expr(_rbexpr.str_lengths)
end

#ljust(width, fillchar = " ") ⇒ `Expr`

Return the string left justified in a string of length width.

Padding is done using the specified fillchar. The original string is returned if width is less than or equal to s.length.

Examples:

df = Polars::DataFrame.new({"a" => ["cow", "monkey", nil, "hippopotamus"]})
df.select(Polars.col("a").str.ljust(8, "*"))
# =>
# shape: (4, 1)
# ┌──────────────┐
# │ a            │
# │ ---          │
# │ str          │
# ╞══════════════╡
# │ cow*****     │
# │ monkey**     │
# │ null         │
# │ hippopotamus │
# └──────────────┘

Parameters:

width (Integer) —
Justify left to this length.
fillchar (String) (defaults to: " ") —
Fill with this ASCII character.

Returns:

(Expr)



492
493
494

# File 'lib/polars/string_expr.rb', line 492

def ljust(width, fillchar = " ")
  Utils.wrap_expr(_rbexpr.str_ljust(width, fillchar))
end

#lstrip(matches = nil) ⇒ `Expr`

Remove leading whitespace.

Examples:

df = Polars::DataFrame.new({"foo" => [" lead", "trail ", " both "]})
df.select(Polars.col("foo").str.lstrip)
# =>
# shape: (3, 1)
# ┌────────┐
# │ foo    │
# │ ---    │
# │ str    │
# ╞════════╡
# │ lead   │
# │ trail  │
# │ both   │
# └────────┘

Parameters:

matches (String, nil) (defaults to: nil) —
An optional single character that should be trimmed.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 387

def lstrip(matches = nil)
  if !matches.nil? && matches.length > 1
    raise ArgumentError, "matches should contain a single character"
  end
  Utils.wrap_expr(_rbexpr.str_lstrip(matches))
end

#n_chars ⇒ `Expr`

Note:

If you know that you are working with ASCII text, lengths will be equivalent, and faster (returns length in terms of the number of bytes).

Get length of the strings as :u32 (as number of chars).

Examples:

df = Polars::DataFrame.new({"s" => ["Café", nil, "345", "東京"]}).with_columns(
  [
    Polars.col("s").str.lengths.alias("length"),
    Polars.col("s").str.n_chars.alias("nchars")
  ]
)
df
# =>
# shape: (4, 3)
# ┌──────┬────────┬────────┐
# │ s    ┆ length ┆ nchars │
# │ ---  ┆ ---    ┆ ---    │
# │ str  ┆ u32    ┆ u32    │
# ╞══════╪════════╪════════╡
# │ Café ┆ 5      ┆ 4      │
# │ null ┆ null   ┆ null   │
# │ 345  ┆ 3      ┆ 3      │
# │ 東京 ┆ 6      ┆ 2      │
# └──────┴────────┴────────┘

Returns:

(Expr)



269
270
271

# File 'lib/polars/string_expr.rb', line 269

def n_chars
  Utils.wrap_expr(_rbexpr.str_n_chars)
end

#parse_int(radix = 2, strict: true) ⇒ `Expr`

Parse integers with base radix from strings.

By default base 2. ParseError/Overflows become Nulls.

Examples:

df = Polars::DataFrame.new({"bin" => ["110", "101", "010", "invalid"]})
df.select(Polars.col("bin").str.parse_int(2, strict: false))
# =>
# shape: (4, 1)
# ┌──────┐
# │ bin  │
# │ ---  │
# │ i32  │
# ╞══════╡
# │ 6    │
# │ 5    │
# │ 2    │
# │ null │
# └──────┘

df = Polars::DataFrame.new({"hex" => ["fa1e", "ff00", "cafe", nil]})
df.select(Polars.col("hex").str.parse_int(16, strict: true))
# =>
# shape: (4, 1)
# ┌───────┐
# │ hex   │
# │ ---   │
# │ i32   │
# ╞═══════╡
# │ 64030 │
# │ 65280 │
# │ 51966 │
# │ null  │
# └───────┘

Parameters:

radix (Integer) (defaults to: 2) —
Positive integer which is the base of the string we are parsing. Default: 2.
strict (Boolean) (defaults to: true) —
Bool, Default=true will raise any ParseError or overflow as ComputeError. False silently convert to Null.

Returns:

(Expr)



1138
1139
1140

# File 'lib/polars/string_expr.rb', line 1138

def parse_int(radix = 2, strict: true)
  Utils.wrap_expr(_rbexpr.str_parse_int(radix, strict))
end

#replace(pattern, value, literal: false, n: 1) ⇒ `Expr`

Replace first matching regex/literal substring with a new string value.

Examples:

df = Polars::DataFrame.new({"id" => [1, 2], "text" => ["123abc", "abc456"]})
df.with_column(
  Polars.col("text").str.replace('abc\b', "ABC")
)
# =>
# shape: (2, 2)
# ┌─────┬────────┐
# │ id  ┆ text   │
# │ --- ┆ ---    │
# │ i64 ┆ str    │
# ╞═════╪════════╡
# │ 1   ┆ 123ABC │
# │ 2   ┆ abc456 │
# └─────┴────────┘

Parameters:

pattern (String) —
Regex pattern.
value (String) —
Replacement string.
literal (Boolean) (defaults to: false) —
Treat pattern as a literal string.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 1002

def replace(pattern, value, literal: false, n: 1)
  pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true)
  value = Utils.expr_to_lit_or_expr(value, str_to_lit: true)
  Utils.wrap_expr(_rbexpr.str_replace_n(pattern._rbexpr, value._rbexpr, literal, n))
end

#replace_all(pattern, value, literal: false) ⇒ `Expr`

Replace all matching regex/literal substrings with a new string value.

Examples:

df = Polars::DataFrame.new({"id" => [1, 2], "text" => ["abcabc", "123a123"]})
df.with_column(Polars.col("text").str.replace_all("a", "-"))
# =>
# shape: (2, 2)
# ┌─────┬─────────┐
# │ id  ┆ text    │
# │ --- ┆ ---     │
# │ i64 ┆ str     │
# ╞═════╪═════════╡
# │ 1   ┆ -bc-bc  │
# │ 2   ┆ 123-123 │
# └─────┴─────────┘

Parameters:

pattern (String) —
Regex pattern.
value (String) —
Replacement string.
literal (Boolean) (defaults to: false) —
Treat pattern as a literal string.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 1032

def replace_all(pattern, value, literal: false)
  pattern = Utils.expr_to_lit_or_expr(pattern, str_to_lit: true)
  value = Utils.expr_to_lit_or_expr(value, str_to_lit: true)
  Utils.wrap_expr(_rbexpr.str_replace_all(pattern._rbexpr, value._rbexpr, literal))
end

#rjust(width, fillchar = " ") ⇒ `Expr`

Return the string right justified in a string of length width.

Padding is done using the specified fillchar. The original string is returned if width is less than or equal to s.length.

Examples:

df = Polars::DataFrame.new({"a" => ["cow", "monkey", nil, "hippopotamus"]})
df.select(Polars.col("a").str.rjust(8, "*"))
# =>
# shape: (4, 1)
# ┌──────────────┐
# │ a            │
# │ ---          │
# │ str          │
# ╞══════════════╡
# │ *****cow     │
# │ **monkey     │
# │ null         │
# │ hippopotamus │
# └──────────────┘

Parameters:

width (Integer) —
Justify right to this length.
fillchar (String) (defaults to: " ") —
Fill with this ASCII character.

Returns:

(Expr)



524
525
526

# File 'lib/polars/string_expr.rb', line 524

def rjust(width, fillchar = " ")
  Utils.wrap_expr(_rbexpr.str_rjust(width, fillchar))
end

#rstrip(matches = nil) ⇒ `Expr`

Remove trailing whitespace.

Examples:

df = Polars::DataFrame.new({"foo" => [" lead", "trail ", " both "]})
df.select(Polars.col("foo").str.rstrip)
# =>
# shape: (3, 1)
# ┌───────┐
# │ foo   │
# │ ---   │
# │ str   │
# ╞═══════╡
# │  lead │
# │ trail │
# │  both │
# └───────┘

Parameters:

matches (String, nil) (defaults to: nil) —
An optional single character that should be trimmed.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 415

def rstrip(matches = nil)
  if !matches.nil? && matches.length > 1
    raise ArgumentError, "matches should contain a single character"
  end
  Utils.wrap_expr(_rbexpr.str_rstrip(matches))
end

#slice(offset, length = nil) ⇒ `Expr`

Create subslices of the string values of a Utf8 Series.

Examples:

df = Polars::DataFrame.new({"s" => ["pear", nil, "papaya", "dragonfruit"]})
df.with_column(
  Polars.col("s").str.slice(-3).alias("s_sliced")
)
# =>
# shape: (4, 2)
# ┌─────────────┬──────────┐
# │ s           ┆ s_sliced │
# │ ---         ┆ ---      │
# │ str         ┆ str      │
# ╞═════════════╪══════════╡
# │ pear        ┆ ear      │
# │ null        ┆ null     │
# │ papaya      ┆ aya      │
# │ dragonfruit ┆ uit      │
# └─────────────┴──────────┘

Parameters:

offset (Integer) —
Start index. Negative indexing is supported.
length (Integer) (defaults to: nil) —
Length of the slice. If set to nil (default), the slice is taken to the end of the string.

Returns:

(Expr)



1065
1066
1067

# File 'lib/polars/string_expr.rb', line 1065

def slice(offset, length = nil)
  Utils.wrap_expr(_rbexpr.str_slice(offset, length))
end

#split(by, inclusive: false) ⇒ `Expr`

Split the string by a substring.

Examples:

df = Polars::DataFrame.new({"s" => ["foo bar", "foo-bar", "foo bar baz"]})
df.select(Polars.col("s").str.split(" "))
# =>
# shape: (3, 1)
# ┌───────────────────────┐
# │ s                     │
# │ ---                   │
# │ list[str]             │
# ╞═══════════════════════╡
# │ ["foo", "bar"]        │
# │ ["foo-bar"]           │
# │ ["foo", "bar", "baz"] │
# └───────────────────────┘

Parameters:

by (String) —
Substring to split by.
inclusive (Boolean) (defaults to: false) —
If true, include the split character/string in the results.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 894

def split(by, inclusive: false)
  if inclusive
    Utils.wrap_expr(_rbexpr.str_split_inclusive(by))
  else
    Utils.wrap_expr(_rbexpr.str_split(by))
  end
end

#split_exact(by, n, inclusive: false) ⇒ `Expr`

Split the string by a substring using n splits.

Results in a struct of n+1 fields.

If it cannot make n splits, the remaining field elements will be null.

Examples:

df = Polars::DataFrame.new({"x" => ["a_1", nil, "c", "d_4"]})
df.select(
  [
    Polars.col("x").str.split_exact("_", 1).alias("fields")
  ]
)
# =>
# shape: (4, 1)
# ┌─────────────┐
# │ fields      │
# │ ---         │
# │ struct[2]   │
# ╞═════════════╡
# │ {"a","1"}   │
# │ {null,null} │
# │ {"c",null}  │
# │ {"d","4"}   │
# └─────────────┘

Parameters:

by (String) —
Substring to split by.
n (Integer) —
Number of splits to make.
inclusive (Boolean) (defaults to: false) —
If true, include the split character/string in the results.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 936

def split_exact(by, n, inclusive: false)
  if inclusive
    Utils.wrap_expr(_rbexpr.str_split_exact_inclusive(by, n))
  else
    Utils.wrap_expr(_rbexpr.str_split_exact(by, n))
  end
end

#splitn(by, n) ⇒ `Expr`

Split the string by a substring, restricted to returning at most n items.

If the number of possible splits is less than n-1, the remaining field elements will be null. If the number of possible splits is n-1 or greater, the last (nth) substring will contain the remainder of the string.

Examples:

df = Polars::DataFrame.new({"s" => ["foo bar", nil, "foo-bar", "foo bar baz"]})
df.select(Polars.col("s").str.splitn(" ", 2).alias("fields"))
# =>
# shape: (4, 1)
# ┌───────────────────┐
# │ fields            │
# │ ---               │
# │ struct[2]         │
# ╞═══════════════════╡
# │ {"foo","bar"}     │
# │ {null,null}       │
# │ {"foo-bar",null}  │
# │ {"foo","bar baz"} │
# └───────────────────┘

Parameters:

by (String) —
Substring to split by.
n (Integer) —
Max number of items to return.

Returns:

(Expr)



972
973
974

# File 'lib/polars/string_expr.rb', line 972

def splitn(by, n)
  Utils.wrap_expr(_rbexpr.str_splitn(by, n))
end

#starts_with(sub) ⇒ `Expr`

Check if string values start with a substring.

Examples:

df = Polars::DataFrame.new({"fruits" => ["apple", "mango", nil]})
df.with_column(
  Polars.col("fruits").str.starts_with("app").alias("has_prefix")
)
# =>
# shape: (3, 2)
# ┌────────┬────────────┐
# │ fruits ┆ has_prefix │
# │ ---    ┆ ---        │
# │ str    ┆ bool       │
# ╞════════╪════════════╡
# │ apple  ┆ true       │
# │ mango  ┆ false      │
# │ null   ┆ null       │
# └────────┴────────────┘

Using `starts_with` as a filter condition:

df.filter(Polars.col("fruits").str.starts_with("app"))
# =>
# shape: (1, 1)
# ┌────────┐
# │ fruits │
# │ ---    │
# │ str    │
# ╞════════╡
# │ apple  │
# └────────┘

Parameters:

sub (String) —
Prefix substring.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 638

def starts_with(sub)
  sub = Utils.expr_to_lit_or_expr(sub, str_to_lit: true)._rbexpr
  Utils.wrap_expr(_rbexpr.str_starts_with(sub))
end

#strip(matches = nil) ⇒ `Expr`

Remove leading and trailing whitespace.

Examples:

df = Polars::DataFrame.new({"foo" => [" lead", "trail ", " both "]})
df.select(Polars.col("foo").str.strip)
# =>
# shape: (3, 1)
# ┌───────┐
# │ foo   │
# │ ---   │
# │ str   │
# ╞═══════╡
# │ lead  │
# │ trail │
# │ both  │
# └───────┘

Parameters:

matches (String, nil) (defaults to: nil) —
An optional single character that should be trimmed.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 359

def strip(matches = nil)
  if !matches.nil? && matches.length > 1
    raise ArgumentError, "matches should contain a single character"
  end
  Utils.wrap_expr(_rbexpr.str_strip(matches))
end

#strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false) ⇒ `Expr`

Note:

When parsing a Datetime the column precision will be inferred from the format string, if given, eg: "%F %T%.3f" => Datetime("ms"). If no fractional second component is found then the default is "us".

Parse a Utf8 expression to a Date/Datetime/Time type.

Examples:

Dealing with a consistent format:

s = Polars::Series.new(["2020-01-01 01:00Z", "2020-01-01 02:00Z"])
s.str.strptime(Polars::Datetime, "%Y-%m-%d %H:%M%#z")
# =>
# shape: (2,)
# Series: '' [datetime[μs, UTC]]
# [
#         2020-01-01 01:00:00 UTC
#         2020-01-01 02:00:00 UTC
# ]

Dealing with different formats.

s = Polars::Series.new(
  "date",
  [
    "2021-04-22",
    "2022-01-04 00:00:00",
    "01/31/22",
    "Sun Jul  8 00:34:60 2001",
  ]
)
s.to_frame.select(
  Polars.coalesce(
    Polars.col("date").str.strptime(Polars::Date, "%F", strict: false),
    Polars.col("date").str.strptime(Polars::Date, "%F %T", strict: false),
    Polars.col("date").str.strptime(Polars::Date, "%D", strict: false),
    Polars.col("date").str.strptime(Polars::Date, "%c", strict: false)
  )
).to_series
# =>
# shape: (4,)
# Series: 'date' [date]
# [
#         2021-04-22
#         2022-01-04
#         2022-01-31
#         2001-07-08
# ]

Parameters:

dtype (Object) —
The data type to convert into. Can be either Date, Datetime, or Time.
format (String) (defaults to: nil) —
Format to use, refer to the chrono strftime documentation for specification. Example: "%y-%m-%d".
strict (Boolean) (defaults to: true) —
Raise an error if any conversion fails.
exact (Boolean) (defaults to: true) —
- If true, require an exact format match.
- If false, allow the format to match anywhere in the target string.
utc (Boolean) (defaults to: false) —
Parse timezone aware datetimes as UTC. This may be useful if you have data with mixed offsets.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 192

def strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false)
  _validate_format_argument(format)

  if dtype == Date
    to_date(format, strict: strict, exact: exact, cache: cache)
  elsif dtype == Datetime || dtype.is_a?(Datetime)
    dtype = Datetime.new if dtype == Datetime
    time_unit = dtype.time_unit
    time_zone = dtype.time_zone
    to_datetime(format, time_unit: time_unit, time_zone: time_zone, strict: strict, exact: exact, cache: cache)
  elsif dtype == Time
    to_time(format, strict: strict, cache: cache)
  else
    raise ArgumentError, "dtype should be of type {Date, Datetime, Time}"
  end
end

#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ `Expr`

Convert a Utf8 column into a Date column.

Examples:

s = Polars::Series.new(["2020/01/01", "2020/02/01", "2020/03/01"])
s.str.to_date
# =>
# shape: (3,)
# Series: '' [date]
# [
#         2020-01-01
#         2020-02-01
#         2020-03-01
# ]

Parameters:

format (String) (defaults to: nil) —
Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%Y-%m-%d". If set to nil (default), the format is inferred from the data.
strict (Boolean) (defaults to: true) —
Raise an error if any conversion fails.
exact (Boolean) (defaults to: true) —
Require an exact format match. If false, allow the format to match anywhere in the target string.
cache (Boolean) (defaults to: true) —
Use a cache of unique, converted dates to apply the conversion.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 40

def to_date(format = nil, strict: true, exact: true, cache: true)
  _validate_format_argument(format)
  Utils.wrap_expr(self._rbexpr.str_to_date(format, strict, exact, cache))
end

#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true) ⇒ `Expr`

Convert a Utf8 column into a Datetime column.

Examples:

s = Polars::Series.new(["2020-01-01 01:00Z", "2020-01-01 02:00Z"])
s.str.to_datetime("%Y-%m-%d %H:%M%#z")
# =>
# shape: (2,)
# Series: '' [datetime[μs, UTC]]
# [
#         2020-01-01 01:00:00 UTC
#         2020-01-01 02:00:00 UTC
# ]

Parameters:

format (String) (defaults to: nil) —
Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%Y-%m-%d %H:%M:%S". If set to nil (default), the format is inferred from the data.
time_unit ("us", "ns", "ms") (defaults to: nil) —
Unit of time for the resulting Datetime column. If set to nil (default), the time unit is inferred from the format string if given, eg: "%F %T%.3f" => Datetime("ms"). If no fractional second component is found, the default is "us".
time_zone (String) (defaults to: nil) —
Time zone for the resulting Datetime column.
strict (Boolean) (defaults to: true) —
Raise an error if any conversion fails.
exact (Boolean) (defaults to: true) —
Require an exact format match. If false, allow the format to match anywhere in the target string.
cache (Boolean) (defaults to: true) —
Use a cache of unique, converted datetimes to apply the conversion.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 79

def to_datetime(
  format = nil,
  time_unit: nil,
  time_zone: nil,
  strict: true,
  exact: true,
  cache: true
)
  _validate_format_argument(format)
  Utils.wrap_expr(
    self._rbexpr.str_to_datetime(
      format,
      time_unit,
      time_zone,
      strict,
      exact,
      cache
    )
  )
end

#to_lowercase ⇒ `Expr`

Transform to lowercase variant.

Examples:

df = Polars::DataFrame.new({"foo" => ["CAT", "DOG"]})
df.select(Polars.col("foo").str.to_lowercase)
# =>
# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ cat │
# │ dog │
# └─────┘

Returns:

(Expr)



334
335
336

# File 'lib/polars/string_expr.rb', line 334

def to_lowercase
  Utils.wrap_expr(_rbexpr.str_to_lowercase)
end

#to_time(format = nil, strict: true, cache: true) ⇒ `Expr`

Convert a Utf8 column into a Time column.

Examples:

s = Polars::Series.new(["01:00", "02:00", "03:00"])
s.str.to_time("%H:%M")
# =>
# shape: (3,)
# Series: '' [time]
# [
#         01:00:00
#         02:00:00
#         03:00:00
# ]

Parameters:

format (String) (defaults to: nil) —
Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%H:%M:%S". If set to nil (default), the format is inferred from the data.
strict (Boolean) (defaults to: true) —
Raise an error if any conversion fails.
cache (Boolean) (defaults to: true) —
Use a cache of unique, converted times to apply the conversion.

Returns:

(Expr)

# File 'lib/polars/string_expr.rb', line 125

def to_time(format = nil, strict: true, cache: true)
  _validate_format_argument(format)
  Utils.wrap_expr(_rbexpr.str_to_time(format, strict, cache))
end

#to_uppercase ⇒ `Expr`

Transform to uppercase variant.

Examples:

df = Polars::DataFrame.new({"foo" => ["cat", "dog"]})
df.select(Polars.col("foo").str.to_uppercase)
# =>
# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ CAT │
# │ DOG │
# └─────┘

Returns:

(Expr)



313
314
315

# File 'lib/polars/string_expr.rb', line 313

def to_uppercase
  Utils.wrap_expr(_rbexpr.str_to_uppercase)
end

#zfill(alignment) ⇒ `Expr`

Fills the string with zeroes.

Return a copy of the string left filled with ASCII '0' digits to make a string of length width.

A leading sign prefix ('+'/'-') is handled by inserting the padding after the sign character rather than before. The original string is returned if width is less than or equal to s.length.

Examples:

df = Polars::DataFrame.new(
  {
    "num" => [-10, -1, 0, 1, 10, 100, 1000, 10000, 100000, 1000000, nil]
  }
)
df.with_column(Polars.col("num").cast(String).str.zfill(5))
# =>
# shape: (11, 1)
# ┌─────────┐
# │ num     │
# │ ---     │
# │ str     │
# ╞═════════╡
# │ -0010   │
# │ -0001   │
# │ 00000   │
# │ 00001   │
# │ …       │
# │ 10000   │
# │ 100000  │
# │ 1000000 │
# │ null    │
# └─────────┘

Parameters:

alignment (Integer) —
Fill the value up to this length

Returns:

(Expr)



460
461
462

# File 'lib/polars/string_expr.rb', line 460

def zfill(alignment)
  Utils.wrap_expr(_rbexpr.str_zfill(alignment))
end

Class: Polars::StringExpr

Overview

Instance Method Summary collapse

Instance Method Details

#concat(delimiter = "-") ⇒ Expr

Examples:

#contains(pattern, literal: false, strict: true) ⇒ Expr

Examples:

#count_match(pattern) ⇒ Expr

Examples:

#decode(encoding, strict: true) ⇒ Expr

Examples:

#encode(encoding) ⇒ Expr

Examples:

#ends_with(sub) ⇒ Expr

Examples:

Using ends_with as a filter condition:

#explode ⇒ Expr

Examples:

#extract(pattern, group_index: 1) ⇒ Expr

Examples:

#extract_all(pattern) ⇒ Expr

Examples:

#json_extract(dtype = nil, infer_schema_length: 100) ⇒ Expr

Examples:

#json_path_match(json_path) ⇒ Expr

Examples:

#lengths ⇒ Expr

Examples:

#ljust(width, fillchar = " ") ⇒ Expr

Examples:

#lstrip(matches = nil) ⇒ Expr

Examples:

#n_chars ⇒ Expr

Examples:

#parse_int(radix = 2, strict: true) ⇒ Expr

Examples:

#replace(pattern, value, literal: false, n: 1) ⇒ Expr

Examples:

#replace_all(pattern, value, literal: false) ⇒ Expr

Examples:

#rjust(width, fillchar = " ") ⇒ Expr

Examples:

#rstrip(matches = nil) ⇒ Expr

Examples:

#slice(offset, length = nil) ⇒ Expr

Examples:

#split(by, inclusive: false) ⇒ Expr

Examples:

#split_exact(by, n, inclusive: false) ⇒ Expr

Examples:

#splitn(by, n) ⇒ Expr

Examples:

#starts_with(sub) ⇒ Expr

Examples:

Using starts_with as a filter condition:

#strip(matches = nil) ⇒ Expr

Examples:

#strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false) ⇒ Expr

Examples:

Dealing with a consistent format:

Dealing with different formats.

#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ Expr

Examples:

#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true) ⇒ Expr

Examples:

#to_lowercase ⇒ Expr

Examples:

#to_time(format = nil, strict: true, cache: true) ⇒ Expr

Examples:

#to_uppercase ⇒ Expr

Examples:

#zfill(alignment) ⇒ Expr

Examples:

#concat(delimiter = "-") ⇒ `Expr`

#contains(pattern, literal: false, strict: true) ⇒ `Expr`

#count_match(pattern) ⇒ `Expr`

#decode(encoding, strict: true) ⇒ `Expr`

#encode(encoding) ⇒ `Expr`

#ends_with(sub) ⇒ `Expr`

Using `ends_with` as a filter condition:

#explode ⇒ `Expr`

#extract(pattern, group_index: 1) ⇒ `Expr`

#extract_all(pattern) ⇒ `Expr`

#json_extract(dtype = nil, infer_schema_length: 100) ⇒ `Expr`

#json_path_match(json_path) ⇒ `Expr`

#lengths ⇒ `Expr`

#ljust(width, fillchar = " ") ⇒ `Expr`

#lstrip(matches = nil) ⇒ `Expr`

#n_chars ⇒ `Expr`

#parse_int(radix = 2, strict: true) ⇒ `Expr`

#replace(pattern, value, literal: false, n: 1) ⇒ `Expr`

#replace_all(pattern, value, literal: false) ⇒ `Expr`

#rjust(width, fillchar = " ") ⇒ `Expr`

#rstrip(matches = nil) ⇒ `Expr`

#slice(offset, length = nil) ⇒ `Expr`

#split(by, inclusive: false) ⇒ `Expr`

#split_exact(by, n, inclusive: false) ⇒ `Expr`

#splitn(by, n) ⇒ `Expr`

#starts_with(sub) ⇒ `Expr`

Using `starts_with` as a filter condition:

#strip(matches = nil) ⇒ `Expr`

#strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false) ⇒ `Expr`

#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ `Expr`

#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true) ⇒ `Expr`

#to_lowercase ⇒ `Expr`

#to_time(format = nil, strict: true, cache: true) ⇒ `Expr`

#to_uppercase ⇒ `Expr`

#zfill(alignment) ⇒ `Expr`