Module: DaruLite::DataFrame::Joinable

Included in:
DaruLite::DataFrame
Defined in:
lib/daru_lite/data_frame/joinable.rb

Instance Method Summary collapse

Instance Method Details

#concat(other_df) ⇒ Object

Concatenate another DataFrame along corresponding columns. If columns do not exist in both dataframes, they are filled with nils



6
7
8
9
10
11
12
13
14
# File 'lib/daru_lite/data_frame/joinable.rb', line 6

def concat(other_df)
  vectors = (@vectors.to_a + other_df.vectors.to_a).uniq

  data = vectors.map do |v|
    get_vector_anyways(v).dup.concat(other_df.get_vector_anyways(v))
  end

  DaruLite::DataFrame.new(data, order: vectors)
end

#join(other_df, opts = {}) ⇒ DaruLite::DataFrame

Join 2 DataFrames with SQL style joins. Currently supports inner, left outer, right outer and full outer joins.

Examples:

Inner Join

left = DaruLite::DataFrame.new({
  :id   => [1,2,3,4],
  :name => ['Pirate', 'Monkey', 'Ninja', 'Spaghetti']
})
right = DaruLite::DataFrame.new({
  :id => [1,2,3,4],
  :name => ['Rutabaga', 'Pirate', 'Darth Vader', 'Ninja']
})
left.join(right, how: :inner, on: [:name])
#=>
##<DaruLite::DataFrame:82416700 @name = 74c0811b-76c6-4c42-ac93-e6458e82afb0 @size = 2>
#                 id_1       name       id_2
#         0          1     Pirate          2
#         1          3      Ninja          4

Parameters:

  • other_df (DaruLite::DataFrame)

    Another DataFrame on which the join is to be performed.

  • opts (Hash) (defaults to: {})

    Options Hash

  • :how (Hash)

    a customizable set of options

  • :on (Hash)

    a customizable set of options

  • :indicator (Hash)

    a customizable set of options

Returns:



78
79
80
# File 'lib/daru_lite/data_frame/joinable.rb', line 78

def join(other_df, opts = {})
  DaruLite::Core::Merge.join(self, other_df, opts)
end

#merge(other_df) ⇒ DaruLite::DataFrame

Merge vectors from two DataFrames. In case of name collision, the vectors names are changed to x_1, x_2 .…

Returns:



33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# File 'lib/daru_lite/data_frame/joinable.rb', line 33

def merge(other_df)
  unless nrows == other_df.nrows
    raise ArgumentError,
          "Number of rows must be equal in this: #{nrows} and other: #{other_df.nrows}"
  end

  new_fields = (@vectors.to_a + other_df.vectors.to_a)
  new_fields = ArrayHelper.recode_repeated(new_fields)
  DataFrame.new({}, order: new_fields).tap do |df_new|
    (0...nrows).each do |i|
      df_new.add_row row[i].to_a + other_df.row[i].to_a
    end
    df_new.index = @index if @index == other_df.index
    df_new.update
  end
end

#one_to_many(parent_fields, pattern) ⇒ Object

Creates a new dataset for one to many relations on a dataset, based on pattern of field names.

for example, you have a survey for number of children with this structure:

id, name, child_name_1, child_age_1, child_name_2, child_age_2

with

ds.one_to_many([:id], "child_%v_%n"

the field of first parameters will be copied verbatim to new dataset, and fields which responds to second pattern will be added one case for each different %n.

Examples:

cases=[
  ['1','george','red',10,'blue',20,nil,nil],
  ['2','fred','green',15,'orange',30,'white',20],
  ['3','alfred',nil,nil,nil,nil,nil,nil]
]
ds=DaruLite::DataFrame.rows(cases, order:
  [:id, :name,
   :car_color1, :car_value1,
   :car_color2, :car_value2,
   :car_color3, :car_value3])
ds.one_to_many([:id],'car_%v%n').to_matrix
#=> Matrix[
#   ["red", "1", 10],
#   ["blue", "1", 20],
#   ["green", "2", 15],
#   ["orange", "2", 30],
#   ["white", "2", 20]
#   ]


113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# File 'lib/daru_lite/data_frame/joinable.rb', line 113

def one_to_many(parent_fields, pattern)
  vars, numbers = one_to_many_components(pattern)

  DataFrame.new([], order: [*parent_fields, '_col_id', *vars]).tap do |ds|
    each_row do |row|
      verbatim = parent_fields.map { |f| [f, row[f]] }.to_h
      numbers.each do |n|
        generated = one_to_many_row row, n, vars, pattern
        next if generated.values.all?(&:nil?)

        ds.add_row(verbatim.merge(generated).merge('_col_id' => n))
      end
    end
    ds.update
  end
end

#union(other_df) ⇒ Object

Concatenates another DataFrame as #concat. Additionally it tries to preserve the index. If the indices contain common elements, #union will overwrite the according rows in the first dataframe.



20
21
22
23
24
25
26
27
# File 'lib/daru_lite/data_frame/joinable.rb', line 20

def union(other_df)
  index = (@index.to_a + other_df.index.to_a).uniq
  df = row[*(@index.to_a - other_df.index.to_a)]

  df = df.concat(other_df)
  df.index = DaruLite::Index.new(index)
  df
end