Class: Remi::DataFrame::Daru

Inherits:
SimpleDelegator
  • Object
show all
Includes:
Remi::DataFrame
Defined in:
lib/remi/data_frame/daru.rb

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Remi::DataFrame

#[], create, daru, #size, #write_csv

Constructor Details

#initialize(*args, **kargs, &block) ⇒ Daru

Returns a new instance of Daru.



6
7
8
9
10
11
12
# File 'lib/remi/data_frame/daru.rb', line 6

def initialize(*args, **kargs, &block)
  if args[0].is_a? ::Daru::DataFrame
    super(args[0])
  else
    super(::Daru::DataFrame.new(*args, **kargs, &block))
  end
end

Class Method Details

.from_hash_dump(filename) ⇒ Object

Public: Creates a DataFrame by reading the dumped version from a file.



26
27
28
# File 'lib/remi/data_frame/daru.rb', line 26

def self.from_hash_dump(filename)
  Marshal.load(File.binread(filename))
end

Instance Method Details

#aggregate(by:, func:) ⇒ Object

Public: Allows the user to define an arbitrary aggregation function.

by - The name of the DataFrame vector to use to group records. func - A lambda function that accepts three arguments - the

first argument is the DataFrame, the second is the
key to the current group, and the third is the index
of the elements belonging to a group.

Example:

df = Remi::DataFrame::Daru.new( { a: ['a','a','a','b','b'], year: ['2018','2015','2019', '2014', '2013'] })

mymin = lambda do |vector, df, group_key, indices|
  values = indices.map { |idx| df.row[idx][vector] }
  "Group #{group_key} has a minimum value of #{values.min}"
end

df.aggregate(by: :a, func: mymin.curry.(:year))

Returns a Daru::Vector.



50
51
52
53
54
55
56
57
58
59
60
61
62
63
# File 'lib/remi/data_frame/daru.rb', line 50

def aggregate(by:, func:)
  grouped = self.group_by(by)
  df_indices = self.index.to_a
  ::Daru::Vector.new(
    grouped.groups.reduce({}) do |h, (key, indices)|
      # Daru groups don't use the index of the dataframe when returning groups (WTF?).
      # Instead they return the position of the record in the dataframe.  Here, we
      group_df_indices = indices.map { |v| df_indices[v] }
      group_key = key.size == 1 ? key.first : key
      h[group_key] = func.(self, group_key, group_df_indices)
      h
    end
  )
end

#hash_dump(filename) ⇒ Object

Public: Saves a Dataframe to a file.



21
22
23
# File 'lib/remi/data_frame/daru.rb', line 21

def hash_dump(filename)
  File.binwrite(filename, Marshal.dump(self))
end

#remi_df_typeObject

Public: Returns the type of DataFrame



16
17
18
# File 'lib/remi/data_frame/daru.rb', line 16

def remi_df_type
  :daru
end