Module: DaruLite::DataFrame::Sortable

Included in:
DaruLite::DataFrame
Defined in:
lib/daru_lite/data_frame/sortable.rb

Instance Method Summary collapse

Instance Method Details

#order=(order_array) ⇒ Object

Reorder the vectors in a dataframe

Examples:

df = DaruLite::DataFrame({
  a: [1, 2, 3],
  b: [4, 5, 6]
}, order: [:a, :b])
df.order = [:b, :a]
df
# => #<DaruLite::DataFrame(3x2)>
#       b   a
#   0   4   1
#   1   5   2
#   2   6   3

Parameters:

  • order_array (Array)

    new order of the vectors

Raises:

  • (ArgumentError)


18
19
20
21
22
# File 'lib/daru_lite/data_frame/sortable.rb', line 18

def order=(order_array)
  raise ArgumentError, 'Invalid order' unless order_array.tally == vectors.to_a.tally

  initialize(to_h, order: order_array)
end

#rotate_vectors(count = -1)) ⇒ Object

Return the dataframe with rotate vectors positions, the vector at position count is now the first vector of the dataframe. If only one vector in the dataframe, the dataframe is return without any change.

Examples:

df = DaruLite::DataFrame({
  a: [1, 2, 3],
  b: [4, 5, 6],
  total: [5, 7, 9],
})
df.rotate_vectors(-1)
df
# => #<DaruLite::DataFrame(3x3)>
#       total b   a
#   0   5     4   1
#   1   7     5   2
#   2   9     6   3

Parameters:

  • count (defaults to: -1))

    > Integer, the vector at position count will be the first vector of the dataframe.



41
42
43
44
45
46
# File 'lib/daru_lite/data_frame/sortable.rb', line 41

def rotate_vectors(count = -1)
  return self unless vectors.many?

  self.order = vectors.to_a.rotate(count)
  self
end

#sort(vector_order, opts = {}) ⇒ Object

Non-destructive version of #sort!



155
156
157
# File 'lib/daru_lite/data_frame/sortable.rb', line 155

def sort(vector_order, opts = {})
  dup.sort! vector_order, opts
end

#sort!(vector_order, opts = {}) ⇒ Object

Sorts a dataframe (ascending/descending) in the given pripority sequence of vectors, with or without a block.

Examples:

Sort a dataframe with a vector sequence.


df = DaruLite::DataFrame.new({a: [1,2,1,2,3], b: [5,4,3,2,1]})

df.sort [:a, :b]
# =>
# <DaruLite::DataFrame:30604000 @name = d6a9294e-2c09-418f-b646-aa9244653444 @size = 5>
#                   a          b
#        2          1          3
#        0          1          5
#        3          2          2
#        1          2          4
#        4          3          1

Sort a dataframe without a block. Here nils will be handled automatically.


df = DaruLite::DataFrame.new({a: [-3,nil,-1,nil,5], b: [4,3,2,1,4]})

df.sort([:a])
# =>
# <DaruLite::DataFrame:14810920 @name = c07fb5c7-2201-458d-b679-6a1f7ebfe49f @size = 5>
#                    a          b
#         1        nil          3
#         3        nil          1
#         0         -3          4
#         2         -1          2
#         4          5          4

Sort a dataframe with a block with nils handled automatically.


df = DaruLite::DataFrame.new({a: [nil,-1,1,nil,-1,1], b: ['aaa','aa',nil,'baaa','x',nil] })

df.sort [:b], by: {b: lambda { |a| a.length } }
# NoMethodError: undefined method `length' for nil:NilClass
# from (pry):8:in `block in __pry__'

df.sort [:b], by: {b: lambda { |a| a.length } }, handle_nils: true

# =>
# <DaruLite::DataFrame:28469540 @name = 5f986508-556f-468b-be0c-88cc3534445c @size = 6>
#                    a          b
#         2          1        nil
#         5          1        nil
#         4         -1          x
#         1         -1         aa
#         0        nil        aaa
#         3        nil       baaa

Sort a dataframe with a block with nils handled manually.


df = DaruLite::DataFrame.new({a: [nil,-1,1,nil,-1,1], b: ['aaa','aa',nil,'baaa','x',nil] })

# To print nils at the bottom one can use lambda { |a| (a.nil?)[1]:[0,a.length] }
df.sort [:b], by: {b: lambda { |a| (a.nil?)?[1]:[0,a.length] } }, handle_nils: true

# =>
#<DaruLite::DataFrame:22214180 @name = cd7703c7-1dca-4560-840b-5ea51a852ef9 @size = 6>
#                 a          b
#      4         -1          x
#      1         -1         aa
#      0        nil        aaa
#      3        nil       baaa
#      2          1        nil
#      5          1        nil

Parameters:

  • vector_order (Array)

    The order of vector names in which the DataFrame should be sorted.

  • opts (Hash) (defaults to: {})

    opts The options to sort with.

Options Hash (opts):

  • :ascending (TrueClass, FalseClass, Array) — default: true

    Sort in ascending or descending order. Specify Array corresponding to order for multiple sort orders.

  • :by (Hash) — default: lambda{|a| a }

    Specify attributes of objects to to be used for sorting, for each vector name in order as a hash of vector name and lambda expressions. In case a lambda for a vector is not specified, the default will be used.

  • :handle_nils (TrueClass, FalseClass, Array) — default: false

    Handle nils automatically or not when a block is provided. If set to True, nils will appear at top after sorting.

Raises:

  • (ArgumentError)


131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
# File 'lib/daru_lite/data_frame/sortable.rb', line 131

def sort!(vector_order, opts = {})
  raise ArgumentError, 'Required atleast one vector name' if vector_order.empty?

  # To enable sorting with categorical data,
  # map categories to integers preserving their order
  old = convert_categorical_vectors vector_order
  block = sort_prepare_block vector_order, opts

  order = @index.size.times.sort(&block)
  new_index = @index.reorder order

  # To reverse map mapping of categorical data to integers
  restore_categorical_vectors old

  @data.each do |vector|
    vector.reorder! order
  end

  self.index = new_index

  self
end