Class: Polars::StringNameSpace

Inherits:
Object
  • Object
show all
Defined in:
lib/polars/string_name_space.rb

Overview

Series.str namespace.

Instance Method Summary collapse

Dynamic Method Handling

This class handles dynamic methods through the method_missing method in the class Polars::ExprDispatch

Instance Method Details

#concat(delimiter = "-") ⇒ Series

Vertically concat the values in the Series to a single string value.

Examples:

Polars::Series.new([1, nil, 2]).str.concat("-")[0]
# => "1-null-2"

Parameters:

  • delimiter (String) (defaults to: "-")

    The delimiter to insert between consecutive string values.

Returns:



125
126
127
# File 'lib/polars/string_name_space.rb', line 125

def concat(delimiter = "-")
  super
end

#contains(pattern, literal: false) ⇒ Series

Check if strings in Series contain a substring that matches a regex.

Examples:

s = Polars::Series.new(["Crab", "cat and dog", "rab$bit", nil])
s.str.contains("cat|bit")
# =>
# shape: (4,)
# Series: '' [bool]
# [
#         false
#         true
#         true
#         null
# ]
s.str.contains("rab$", literal: true)
# =>
# shape: (4,)
# Series: '' [bool]
# [
#         false
#         false
#         true
#         null
# ]

Parameters:

  • pattern (String)

    A valid regex pattern.

  • literal (Boolean) (defaults to: false)

    Treat pattern as a literal string.

Returns:



162
163
164
# File 'lib/polars/string_name_space.rb', line 162

def contains(pattern, literal: false)
  super
end

#count_match(pattern) ⇒ Series

Count all successive non-overlapping regex matches.

Examples:

s = Polars::Series.new("foo", ["123 bla 45 asd", "xyz 678 910t"])
s.str.count_match('\d')
# =>
# shape: (2,)
# Series: 'foo' [u32]
# [
#         5
#         6
# ]

Parameters:

  • pattern (String)

    A valid regex pattern

Returns:



361
362
363
# File 'lib/polars/string_name_space.rb', line 361

def count_match(pattern)
  super
end

#decode(encoding, strict: false) ⇒ Series

Decode a value using the provided encoding.

Examples:

s = Polars::Series.new(["666f6f", "626172", nil])
s.str.decode("hex")
# =>
# shape: (3,)
# Series: '' [str]
# [
#         "foo"
#         "bar"
#         null
# ]

Parameters:

  • encoding ("hex", "base64")

    The encoding to use.

  • strict (Boolean) (defaults to: false)

    How to handle invalid inputs:

    • true: An error will be thrown if unable to decode a value.
    • false: Unhandled values will be replaced with nil.

Returns:



233
234
235
# File 'lib/polars/string_name_space.rb', line 233

def decode(encoding, strict: false)
  super
end

#encode(encoding) ⇒ Series

Encode a value using the provided encoding.

Examples:

s = Polars::Series.new(["foo", "bar", nil])
s.str.encode("hex")
# =>
# shape: (3,)
# Series: '' [str]
# [
#         "666f6f"
#         "626172"
#         null
# ]

Parameters:

  • encoding ("hex", "base64")

    The encoding to use.

Returns:



255
256
257
# File 'lib/polars/string_name_space.rb', line 255

def encode(encoding)
  super
end

#ends_with(sub) ⇒ Series

Check if string values end with a substring.

Examples:

s = Polars::Series.new("fruits", ["apple", "mango", nil])
s.str.ends_with("go")
# =>
# shape: (3,)
# Series: 'fruits' [bool]
# [
#         false
#         true
#         null
# ]

Parameters:

  • sub (String)

    Suffix substring.

Returns:



184
185
186
# File 'lib/polars/string_name_space.rb', line 184

def ends_with(sub)
  super
end

#extract(pattern, group_index: 1) ⇒ Series

Extract the target capture group from provided patterns.

Examples:

df = Polars::DataFrame.new({"foo" => ["123 bla 45 asd", "xyz 678 910t"]})
df.select([Polars.col("foo").str.extract('(\d+)')])
# =>
# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ 123 │
# ├╌╌╌╌╌┤
# │ 678 │
# └─────┘

Parameters:

  • pattern (String)

    A valid regex pattern

  • group_index (Integer) (defaults to: 1)

    Index of the targeted capture group. Group 0 mean the whole pattern, first group begin at index 1 Default to the first capture group

Returns:



316
317
318
# File 'lib/polars/string_name_space.rb', line 316

def extract(pattern, group_index: 1)
  super
end

#extract_all(pattern) ⇒ Series

Extracts all matches for the given regex pattern.

Extract each successive non-overlapping regex match in an individual string as an array

Examples:

s = Polars::Series.new("foo", ["123 bla 45 asd", "xyz 678 910t"])
s.str.extract_all('(\d+)')
# =>
# shape: (2,)
# Series: 'foo' [list]
# [
#         ["123", "45"]
#         ["678", "910"]
# ]

Parameters:

  • pattern (String)

    A valid regex pattern

Returns:



340
341
342
# File 'lib/polars/string_name_space.rb', line 340

def extract_all(pattern)
  super
end

#json_path_match(json_path) ⇒ Series

Extract the first match of json string with provided JSONPath expression.

Throw errors if encounter invalid json strings. All return value will be casted to Utf8 regardless of the original value.

Documentation on JSONPath standard can be found here.

Examples:

df = Polars::DataFrame.new(
  {"json_val" => ['{"a":"1"}', nil, '{"a":2}', '{"a":2.1}', '{"a":true}']}
)
df.select(Polars.col("json_val").str.json_path_match("$.a"))[0.., 0]
# =>
# shape: (5,)
# Series: 'json_val' [str]
# [
#         "1"
#         null
#         "2"
#         "2.1"
#         "true"
# ]

Parameters:

  • json_path (String)

    A valid JSON path query string.

Returns:



287
288
289
# File 'lib/polars/string_name_space.rb', line 287

def json_path_match(json_path)
  super
end

#lengthsSeries

Note:

The returned lengths are equal to the number of bytes in the UTF8 string. If you need the length in terms of the number of characters, use n_chars instead.

Get length of the string values in the Series (as number of bytes).

Examples:

s = Polars::Series.new(["Café", nil, "345", "東京"])
s.str.lengths
# =>
# shape: (4,)
# Series: '' [u32]
# [
#         5
#         null
#         3
#         6
# ]

Returns:



87
88
89
# File 'lib/polars/string_name_space.rb', line 87

def lengths
  super
end

#ljust(width, fillchar = " ") ⇒ Series

Return the string left justified in a string of length width.

Padding is done using the specified fillchar. The original string is returned if width is less than or equal to s.length.

Examples:

s = Polars::Series.new("a", ["cow", "monkey", nil, "hippopotamus"])
s.str.ljust(8, "*")
# =>
# shape: (4,)
# Series: 'a' [str]
# [
#         "cow*****"
#         "monkey**"
#         null
#         "hippopotamus"
# ]

Parameters:

  • width (Integer)

    Justify left to this length.

  • fillchar (String) (defaults to: " ")

    Fill with this ASCII character.

Returns:



605
606
607
# File 'lib/polars/string_name_space.rb', line 605

def ljust(width, fillchar = " ")
  super
end

#lstrip(matches = nil) ⇒ Series

Remove leading whitespace.

Parameters:

  • matches (String, nil) (defaults to: nil)

    An optional single character that should be trimmed

Returns:



550
551
552
# File 'lib/polars/string_name_space.rb', line 550

def lstrip(matches = nil)
  super
end

#n_charsSeries

Note:

If you know that you are working with ASCII text, lengths will be equivalent, and faster (returns length in terms of the number of bytes).

Get length of the string values in the Series (as number of chars).

Examples:

s = Polars::Series.new(["Café", nil, "345", "東京"])
s.str.n_chars
# =>
# shape: (4,)
# Series: '' [u32]
# [
#         4
#         null
#         3
#         2
# ]

Returns:



111
112
113
# File 'lib/polars/string_name_space.rb', line 111

def n_chars
  super
end

#replace(pattern, value, literal: false) ⇒ Series

Replace first matching regex/literal substring with a new string value.

Examples:

s = Polars::Series.new(["123abc", "abc456"])
s.str.replace('abc\b', "ABC")
# =>
# shape: (2,)
# Series: '' [str]
# [
#         "123ABC"
#         "abc456"
# ]

Parameters:

  • pattern (String)

    A valid regex pattern.

  • value (String)

    Substring to replace.

  • literal (Boolean) (defaults to: false)

    Treat pattern as a literal string.

Returns:



505
506
507
# File 'lib/polars/string_name_space.rb', line 505

def replace(pattern, value, literal: false)
  super
end

#replace_all(pattern, value, literal: false) ⇒ Series

Replace all matching regex/literal substrings with a new string value.

Examples:

df = Polars::Series.new(["abcabc", "123a123"])
df.str.replace_all("a", "-")
# =>
# shape: (2,)
# Series: '' [str]
# [
#         "-bc-bc"
#         "123-123"
# ]

Parameters:

  • pattern (String)

    A valid regex pattern.

  • value (String)

    Substring to replace.

  • literal (Boolean) (defaults to: false)

    Treat pattern as a literal string.

Returns:



530
531
532
# File 'lib/polars/string_name_space.rb', line 530

def replace_all(pattern, value, literal: false)
  super
end

#rjust(width, fillchar = " ") ⇒ Series

Return the string right justified in a string of length width.

Padding is done using the specified fillchar. The original string is returned if width is less than or equal to s.length.

Examples:

s = Polars::Series.new("a", ["cow", "monkey", nil, "hippopotamus"])
s.str.rjust(8, "*")
# =>
# shape: (4,)
# Series: 'a' [str]
# [
#         "*****cow"
#         "**monkey"
#         null
#         "hippopotamus"
# ]

Parameters:

  • width (Integer)

    Justify right to this length.

  • fillchar (String) (defaults to: " ")

    Fill with this ASCII character.

Returns:



633
634
635
# File 'lib/polars/string_name_space.rb', line 633

def rjust(width, fillchar = " ")
  super
end

#rstrip(matches = nil) ⇒ Series

Remove trailing whitespace.

Parameters:

  • matches (String, nil) (defaults to: nil)

    An optional single character that should be trimmed

Returns:



560
561
562
# File 'lib/polars/string_name_space.rb', line 560

def rstrip(matches = nil)
  super
end

#slice(offset, length = nil) ⇒ Series

Create subslices of the string values of a Utf8 Series.

Examples:

s = Polars::Series.new("s", ["pear", nil, "papaya", "dragonfruit"])
s.str.slice(-3)
# =>
# shape: (4,)
# Series: 's' [str]
# [
#         "ear"
#         null
#         "aya"
#         "uit"
# ]

Using the optional length parameter

s.str.slice(4, 3)
# =>
# shape: (4,)
# Series: 's' [str]
# [
#         ""
#         null
#         "ya"
#         "onf"
# ]

Parameters:

  • offset (Integer)

    Start index. Negative indexing is supported.

  • length (Integer) (defaults to: nil)

    Length of the slice. If set to nil (default), the slice is taken to the end of the string.

Returns:



685
686
687
688
# File 'lib/polars/string_name_space.rb', line 685

def slice(offset, length = nil)
  s = Utils.wrap_s(_s)
  s.to_frame.select(Polars.col(s.name).str.slice(offset, length)).to_series
end

#split(by, inclusive: false) ⇒ Series

Split the string by a substring.

Parameters:

  • by (String)

    Substring to split by.

  • inclusive (Boolean) (defaults to: false)

    If true, include the split character/string in the results.

Returns:



373
374
375
# File 'lib/polars/string_name_space.rb', line 373

def split(by, inclusive: false)
  super
end

#split_exact(by, n, inclusive: false) ⇒ Series

Split the string by a substring using n splits.

Results in a struct of n+1 fields.

If it cannot make n splits, the remaining field elements will be null.

Examples:

df = Polars::DataFrame.new({"x" => ["a_1", nil, "c", "d_4"]})
df["x"].str.split_exact("_", 1).alias("fields")
# =>
# shape: (4,)
# Series: 'fields' [struct[2]]
# [
#         {"a","1"}
#         {null,null}
#         {"c",null}
#         {"d","4"}
# ]

Split string values in column x in exactly 2 parts and assign each part to a new column.

df["x"]
  .str.split_exact("_", 1)
  .struct.rename_fields(["first_part", "second_part"])
  .alias("fields")
  .to_frame
  .unnest("fields")
# =>
# shape: (4, 2)
# ┌────────────┬─────────────┐
# │ first_part ┆ second_part │
# │ ---        ┆ ---         │
# │ str        ┆ str         │
# ╞════════════╪═════════════╡
# │ a          ┆ 1           │
# ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ null       ┆ null        │
# ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ c          ┆ null        │
# ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ d          ┆ 4           │
# └────────────┴─────────────┘

Parameters:

  • by (String)

    Substring to split by.

  • n (Integer)

    Number of splits to make.

  • inclusive (Boolean) (defaults to: false)

    If true, include the split character/string in the results.

Returns:



427
428
429
# File 'lib/polars/string_name_space.rb', line 427

def split_exact(by, n, inclusive: false)
  super
end

#splitn(by, n) ⇒ Series

Split the string by a substring, restricted to returning at most n items.

If the number of possible splits is less than n-1, the remaining field elements will be null. If the number of possible splits is n-1 or greater, the last (nth) substring will contain the remainder of the string.

Examples:

df = Polars::DataFrame.new({"s" => ["foo bar", nil, "foo-bar", "foo bar baz"]})
df["s"].str.splitn(" ", 2).alias("fields")
# =>
# shape: (4,)
# Series: 'fields' [struct[2]]
# [
#         {"foo","bar"}
#         {null,null}
#         {"foo-bar",null}
#         {"foo","bar baz"}
# ]

Split string values in column s in exactly 2 parts and assign each part to a new column.

df["s"]
  .str.splitn(" ", 2)
  .struct.rename_fields(["first_part", "second_part"])
  .alias("fields")
  .to_frame
  .unnest("fields")
# =>
# shape: (4, 2)
# ┌────────────┬─────────────┐
# │ first_part ┆ second_part │
# │ ---        ┆ ---         │
# │ str        ┆ str         │
# ╞════════════╪═════════════╡
# │ foo        ┆ bar         │
# ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ null       ┆ null        │
# ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ foo-bar    ┆ null        │
# ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ foo        ┆ bar baz     │
# └────────────┴─────────────┘

Parameters:

  • by (String)

    Substring to split by.

  • n (Integer)

    Max number of items to return.

Returns:



479
480
481
482
# File 'lib/polars/string_name_space.rb', line 479

def splitn(by, n)
  s = Utils.wrap_s(_s)
  s.to_frame.select(Polars.col(s.name).str.splitn(by, n)).to_series
end

#starts_with(sub) ⇒ Series

Check if string values start with a substring.

Examples:

s = Polars::Series.new("fruits", ["apple", "mango", nil])
s.str.starts_with("app")
# =>
# shape: (3,)
# Series: 'fruits' [bool]
# [
#         true
#         false
#         null
# ]

Parameters:

  • sub (String)

    Prefix substring.

Returns:



206
207
208
# File 'lib/polars/string_name_space.rb', line 206

def starts_with(sub)
  super
end

#strip(matches = nil) ⇒ Series

Remove leading and trailing whitespace.

Parameters:

  • matches (String, nil) (defaults to: nil)

    An optional single character that should be trimmed

Returns:



540
541
542
# File 'lib/polars/string_name_space.rb', line 540

def strip(matches = nil)
  super
end

#strptime(datatype, fmt = nil, strict: true, exact: true, cache: true, tz_aware: false) ⇒ Series

Parse a Series of dtype Utf8 to a Date/Datetime Series.

Examples:

s = Polars::Series.new(
  "date",
  [
    "2021-04-22",
    "2022-01-04 00:00:00",
    "01/31/22",
    "Sun Jul  8 00:34:60 2001"
  ]
)
s.to_frame.with_column(
  Polars.col("date")
    .str.strptime(:date, "%F", strict: false)
    .fill_null(
      Polars.col("date").str.strptime(:date, "%F %T", strict: false)
    )
    .fill_null(Polars.col("date").str.strptime(:date, "%D", strict: false))
    .fill_null(Polars.col("date").str.strptime(:date, "%c", strict: false))
)
# =>
# shape: (4, 1)
# ┌────────────┐
# │ date       │
# │ ---        │
# │ date       │
# ╞════════════╡
# │ 2021-04-22 │
# ├╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 2022-01-04 │
# ├╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 2022-01-31 │
# ├╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 2001-07-08 │
# └────────────┘

Parameters:

  • datatype (Symbol)

    :date, :dateime, or :time.

  • fmt (String) (defaults to: nil)

    Format to use, refer to the chrono strftime documentation for specification. Example: "%y-%m-%d".

  • strict (Boolean) (defaults to: true)

    Raise an error if any conversion fails.

  • exact (Boolean) (defaults to: true)
    • If true, require an exact format match.
    • If false, allow the format to match anywhere in the target string.

Returns:



63
64
65
# File 'lib/polars/string_name_space.rb', line 63

def strptime(datatype, fmt = nil, strict: true, exact: true, cache: true, tz_aware: false)
  super
end

#to_lowercaseSeries

Modify the strings to their lowercase equivalent.

Returns:



640
641
642
# File 'lib/polars/string_name_space.rb', line 640

def to_lowercase
  super
end

#to_uppercaseSeries

Modify the strings to their uppercase equivalent.

Returns:



647
648
649
# File 'lib/polars/string_name_space.rb', line 647

def to_uppercase
  super
end

#zfill(alignment) ⇒ Series

Fills the string with zeroes.

Return a copy of the string left filled with ASCII '0' digits to make a string of length width.

A leading sign prefix ('+'/'-') is handled by inserting the padding after the sign character rather than before. The original string is returned if width is less than or equal to s.length.

Parameters:

  • alignment (Integer)

    Fill the value up to this length.

Returns:



577
578
579
# File 'lib/polars/string_name_space.rb', line 577

def zfill(alignment)
  super
end