Class: Polars::CatExpr

Inherits:
Object
  • Object
show all
Defined in:
lib/polars/cat_expr.rb

Overview

Namespace for categorical related expressions.

Instance Method Summary collapse

Instance Method Details

#ends_with(suffix) ⇒ Expr

Note:

Whereas str.ends_with allows expression inputs, cat.ends_with requires a literal string value.

Check if string representations of values end with a substring.

Examples:

df = Polars::DataFrame.new(
  {"fruits" => Polars::Series.new(["apple", "mango", nil], dtype: Polars::Categorical)}
)
df.with_columns(Polars.col("fruits").cat.ends_with("go").alias("has_suffix"))
# =>
# shape: (3, 2)
# ┌────────┬────────────┐
# │ fruits ┆ has_suffix │
# │ ---    ┆ ---        │
# │ cat    ┆ bool       │
# ╞════════╪════════════╡
# │ apple  ┆ false      │
# │ mango  ┆ true       │
# │ null   ┆ null       │
# └────────┴────────────┘

Using ends_with as a filter condition:

df.filter(Polars.col("fruits").cat.ends_with("go"))
# =>
# shape: (1, 1)
# ┌────────┐
# │ fruits │
# │ ---    │
# │ cat    │
# ╞════════╡
# │ mango  │
# └────────┘

Parameters:

  • suffix (String)

    Suffix substring.

Returns:



195
196
197
198
199
200
201
# File 'lib/polars/cat_expr.rb', line 195

def ends_with(suffix)
  if !suffix.is_a?(::String)
    msg = "'suffix' must be a string; found #{suffix.inspect}"
    raise TypeError, msg
  end
  Utils.wrap_expr(_rbexpr.cat_ends_with(suffix))
end

#get_categoriesExpr

Get the categories stored in this data type.

Examples:

df = Polars::Series.new(
  "cats", ["foo", "bar", "foo", "foo", "ham"], dtype: Polars::Categorical
).to_frame
df.select(Polars.col("cats").cat.get_categories)
# =>
# shape: (3, 1)
# ┌──────┐
# │ cats │
# │ ---  │
# │ str  │
# ╞══════╡
# │ foo  │
# │ bar  │
# │ ham  │
# └──────┘

Returns:



32
33
34
# File 'lib/polars/cat_expr.rb', line 32

def get_categories
  Utils.wrap_expr(_rbexpr.cat_get_categories)
end

#len_bytesExpr

Note:

When working with non-ASCII text, the length in bytes is not the same as the length in characters. You may want to use len_chars instead. Note that len_bytes is much more performant (O(1)) than len_chars (O(n)).

Return the byte-length of the string representation of each value.

# => # shape: (4, 3) # ┌──────┬─────────┬─────────┐ # │ a ┆ n_bytes ┆ n_chars │ # │ --- ┆ --- ┆ --- │ # │ cat ┆ u32 ┆ u32 │ # ╞══════╪═════════╪═════════╡ # │ Café ┆ 5 ┆ 4 │ # │ 345 ┆ 3 ┆ 3 │ # │ 東京 ┆ 6 ┆ 2 │ # │ null ┆ null ┆ null │ # └──────┴─────────┴─────────┘

Examples:

df = Polars::DataFrame.new(
  {"a" => Polars::Series.new(["Café", "345", "東京", nil], dtype: Polars::Categorical)}
)
df.with_columns(
  Polars.col("a").cat.len_bytes.alias("n_bytes"),
  Polars.col("a").cat.len_chars.alias("n_chars")
)

Returns:



66
67
68
# File 'lib/polars/cat_expr.rb', line 66

def len_bytes
  Utils.wrap_expr(_rbexpr.cat_len_bytes)
end

#len_charsExpr

Note:

When working with ASCII text, use len_bytes instead to achieve equivalent output with much better performance: len_bytes runs in O(1), while len_chars runs in (O(n)).

A character is defined as a Unicode scalar value. A single character is represented by a single byte when working with ASCII text, and a maximum of 4 bytes otherwise.

Return the number of characters of the string representation of each value.

Examples:

df = Polars::DataFrame.new(
  {"a" => Polars::Series.new(["Café", "345", "東京", nil], dtype: Polars::Categorical)}
)
df.with_columns(
  Polars.col("a").cat.len_chars.alias("n_chars"),
  Polars.col("a").cat.len_bytes.alias("n_bytes")
)
# =>
# shape: (4, 3)
# ┌──────┬─────────┬─────────┐
# │ a    ┆ n_chars ┆ n_bytes │
# │ ---  ┆ ---     ┆ ---     │
# │ cat  ┆ u32     ┆ u32     │
# ╞══════╪═════════╪═════════╡
# │ Café ┆ 4       ┆ 5       │
# │ 345  ┆ 3       ┆ 3       │
# │ 東京 ┆ 2       ┆ 6       │
# │ null ┆ null    ┆ null    │
# └──────┴─────────┴─────────┘

Returns:



103
104
105
# File 'lib/polars/cat_expr.rb', line 103

def len_chars
  Utils.wrap_expr(_rbexpr.cat_len_chars)
end

#slice(offset, length = nil) ⇒ Expr

Note:

Both the offset and length inputs are defined in terms of the number of characters in the (UTF8) string. A character is defined as a Unicode scalar value. A single character is represented by a single byte when working with ASCII text, and a maximum of 4 bytes otherwise.

Extract a substring from the string representation of each value.

Examples:

df = Polars::DataFrame.new(
  {
    "s" => Polars::Series.new(
      ["pear", nil, "papaya", "dragonfruit"],
      dtype: Polars::Categorical
    )
  }
)
df.with_columns(Polars.col("s").cat.slice(-3).alias("slice"))
# =>
# shape: (4, 2)
# ┌─────────────┬───────┐
# │ s           ┆ slice │
# │ ---         ┆ ---   │
# │ cat         ┆ str   │
# ╞═════════════╪═══════╡
# │ pear        ┆ ear   │
# │ null        ┆ null  │
# │ papaya      ┆ aya   │
# │ dragonfruit ┆ uit   │
# └─────────────┴───────┘

Using the optional length parameter

df.with_columns(Polars.col("s").cat.slice(4, 3).alias("slice"))
# =>
# shape: (4, 2)
# ┌─────────────┬───────┐
# │ s           ┆ slice │
# │ ---         ┆ ---   │
# │ cat         ┆ str   │
# ╞═════════════╪═══════╡
# │ pear        ┆       │
# │ null        ┆ null  │
# │ papaya      ┆ ya    │
# │ dragonfruit ┆ onf   │
# └─────────────┴───────┘

Parameters:

  • offset (Integer)

    Start index. Negative indexing is supported.

  • length (Integer) (defaults to: nil)

    Length of the slice. If set to nil (default), the slice is taken to the end of the string.

Returns:



256
257
258
# File 'lib/polars/cat_expr.rb', line 256

def slice(offset, length = nil)
  Utils.wrap_expr(_rbexpr.cat_slice(offset, length))
end

#starts_with(prefix) ⇒ Expr

Note:

Whereas str.starts_with allows expression inputs, cat.starts_with requires a literal string value.

Check if string representations of values start with a substring.

Examples:

df = Polars::DataFrame.new(
  {"fruits" => Polars::Series.new(["apple", "mango", nil], dtype: Polars::Categorical)}
)
df.with_columns(
  Polars.col("fruits").cat.starts_with("app").alias("has_prefix")
)
# =>
# shape: (3, 2)
# ┌────────┬────────────┐
# │ fruits ┆ has_prefix │
# │ ---    ┆ ---        │
# │ cat    ┆ bool       │
# ╞════════╪════════════╡
# │ apple  ┆ true       │
# │ mango  ┆ false      │
# │ null   ┆ null       │
# └────────┴────────────┘

Using starts_with as a filter condition:

df.filter(Polars.col("fruits").cat.starts_with("app"))
# =>
# shape: (1, 1)
# ┌────────┐
# │ fruits │
# │ ---    │
# │ cat    │
# ╞════════╡
# │ apple  │
# └────────┘

Parameters:

  • prefix (String)

    Prefix substring.

Returns:



148
149
150
151
152
153
154
# File 'lib/polars/cat_expr.rb', line 148

def starts_with(prefix)
  if !prefix.is_a?(::String)
    msg = "'prefix' must be a string; found #{prefix.inspect}"
    raise TypeError, msg
  end
  Utils.wrap_expr(_rbexpr.cat_starts_with(prefix))
end