Module: Polars::Functions
- Included in:
- Polars
- Defined in:
- lib/polars/functions.rb
Instance Method Summary collapse
-
#align_frames(*frames, on:, select: nil, reverse: false) ⇒ Object
Align a sequence of frames using the uique values from one or more columns as a key.
-
#concat(items, rechunk: true, how: "vertical", parallel: true) ⇒ Object
Aggregate multiple Dataframes/Series to a single DataFrame/Series.
-
#date_range(low, high, interval, lazy: false, closed: "both", name: nil, time_unit: nil, time_zone: nil) ⇒ Object
Create a range of type
Datetime
(orDate
). -
#get_dummies(df, columns: nil) ⇒ DataFrame
Convert categorical variables into dummy/indicator variables.
-
#ones(n, dtype: nil) ⇒ Series
Return a new Series of given length and type, filled with ones.
-
#zeros(n, dtype: nil) ⇒ Series
Return a new Series of given length and type, filled with zeros.
Instance Method Details
#align_frames(*frames, on:, select: nil, reverse: false) ⇒ Object
Align a sequence of frames using the uique values from one or more columns as a key.
Frames that do not contain the given key values have rows injected (with nulls filling the non-key columns), and each resulting frame is sorted by the key.
The original column order of input frames is not changed unless select
is
specified (in which case the final column order is determined from that).
Note that this does not result in a joined frame - you receive the same number of frames back that you passed in, but each is now aligned by key and has the same number of rows.
359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 |
# File 'lib/polars/functions.rb', line 359 def align_frames( *frames, on:, select: nil, reverse: false ) if frames.empty? return [] elsif frames.map(&:class).uniq.length != 1 raise TypeError, "Input frames must be of a consistent type (all LazyFrame or all DataFrame)" end # establish the superset of all "on" column values, sort, and cache eager = frames[0].is_a?(DataFrame) alignment_frame = ( concat(frames.map { |df| df.lazy.select(on) }) .unique(maintain_order: false) .sort(on, reverse: reverse) ) alignment_frame = ( eager ? alignment_frame.collect.lazy : alignment_frame.cache ) # finally, align all frames aligned_frames = frames.map do |df| alignment_frame.join( df.lazy, on: alignment_frame.columns, how: "left" ).select(df.columns) end if !select.nil? aligned_frames = aligned_frames.map { |df| df.select(select) } end eager ? aligned_frames.map(&:collect) : aligned_frames end |
#concat(items, rechunk: true, how: "vertical", parallel: true) ⇒ Object
Aggregate multiple Dataframes/Series to a single DataFrame/Series.
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
# File 'lib/polars/functions.rb', line 49 def concat(items, rechunk: true, how: "vertical", parallel: true) if items.empty? raise ArgumentError, "cannot concat empty list" end first = items[0] if first.is_a?(DataFrame) if how == "vertical" out = Utils.wrap_df(_concat_df(items)) elsif how == "diagonal" out = Utils.wrap_df(_diag_concat_df(items)) elsif how == "horizontal" out = Utils.wrap_df(_hor_concat_df(items)) else raise ArgumentError, "how must be one of {{'vertical', 'diagonal', 'horizontal'}}, got #{how}" end elsif first.is_a?(LazyFrame) if how == "vertical" # TODO return Utils.wrap_ldf(_concat_lf(items, rechunk, parallel)) else raise ArgumentError, "Lazy only allows 'vertical' concat strategy." end elsif first.is_a?(Series) # TODO out = Utils.wrap_s(_concat_series(items)) elsif first.is_a?(Expr) out = first items[1..-1].each do |e| out = out.append(e) end else raise ArgumentError, "did not expect type: #{first.class.name} in 'Polars.concat'." end if rechunk out.rechunk else out end end |
#date_range(low, high, interval, lazy: false, closed: "both", name: nil, time_unit: nil, time_zone: nil) ⇒ Object
If both low
and high
are passed as date types (not datetime), and the
interval granularity is no finer than 1d, the returned range is also of
type date. All other permutations return a datetime Series.
Create a range of type Datetime
(or Date
).
148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 |
# File 'lib/polars/functions.rb', line 148 def date_range( low, high, interval, lazy: false, closed: "both", name: nil, time_unit: nil, time_zone: nil ) if defined?(ActiveSupport::Duration) && interval.is_a?(ActiveSupport::Duration) raise Todo else interval = interval.to_s if interval.include?(" ") interval = interval.gsub(" ", "") end end if low.is_a?(Expr) || high.is_a?(Expr) || lazy low = Utils.expr_to_lit_or_expr(low, str_to_lit: true) high = Utils.expr_to_lit_or_expr(high, str_to_lit: true) return Utils.wrap_expr( _rb_date_range_lazy(low, high, interval, closed, name, time_zone) ) end low, low_is_date = _ensure_datetime(low) high, high_is_date = _ensure_datetime(high) if !time_unit.nil? tu = time_unit elsif interval.include?("ns") tu = "ns" else tu = "us" end start = Utils.(low, tu) stop = Utils.(high, tu) if name.nil? name = "" end dt_range = Utils.wrap_s( _rb_date_range(start, stop, interval, closed, name, tu, time_zone) ) if low_is_date && high_is_date && !["h", "m", "s"].any? { |v| _interval_granularity(interval).end_with?(v) } dt_range = dt_range.cast(Date) end dt_range end |
#get_dummies(df, columns: nil) ⇒ DataFrame
Convert categorical variables into dummy/indicator variables.
12 13 14 |
# File 'lib/polars/functions.rb', line 12 def get_dummies(df, columns: nil) df.to_dummies(columns: columns) end |
#ones(n, dtype: nil) ⇒ Series
In the lazy API you should probably not use this, but use lit(1)
instead.
Return a new Series of given length and type, filled with ones.
409 410 411 412 413 414 415 |
# File 'lib/polars/functions.rb', line 409 def ones(n, dtype: nil) s = Series.new([1.0]) if dtype s = s.cast(dtype) end s.new_from_index(0, n) end |
#zeros(n, dtype: nil) ⇒ Series
In the lazy API you should probably not use this, but use lit(0)
instead.
Return a new Series of given length and type, filled with zeros.
429 430 431 432 433 434 435 |
# File 'lib/polars/functions.rb', line 429 def zeros(n, dtype: nil) s = Series.new([0.0]) if dtype s = s.cast(dtype) end s.new_from_index(0, n) end |