Module: DaruLite::DataFrame::Joinable
- Included in:
- DaruLite::DataFrame
- Defined in:
- lib/daru_lite/data_frame/joinable.rb
Instance Method Summary collapse
-
#concat(other_df) ⇒ Object
Concatenate another DataFrame along corresponding columns.
-
#join(other_df, opts = {}) ⇒ DaruLite::DataFrame
Join 2 DataFrames with SQL style joins.
-
#merge(other_df) ⇒ DaruLite::DataFrame
Merge vectors from two DataFrames.
-
#one_to_many(parent_fields, pattern) ⇒ Object
Creates a new dataset for one to many relations on a dataset, based on pattern of field names.
-
#union(other_df) ⇒ Object
Concatenates another DataFrame as #concat.
Instance Method Details
#concat(other_df) ⇒ Object
Concatenate another DataFrame along corresponding columns. If columns do not exist in both dataframes, they are filled with nils
6 7 8 9 10 11 12 13 14 |
# File 'lib/daru_lite/data_frame/joinable.rb', line 6 def concat(other_df) vectors = (@vectors.to_a + other_df.vectors.to_a).uniq data = vectors.map do |v| get_vector_anyways(v).dup.concat(other_df.get_vector_anyways(v)) end DaruLite::DataFrame.new(data, order: vectors) end |
#join(other_df, opts = {}) ⇒ DaruLite::DataFrame
Join 2 DataFrames with SQL style joins. Currently supports inner, left outer, right outer and full outer joins.
78 79 80 |
# File 'lib/daru_lite/data_frame/joinable.rb', line 78 def join(other_df, opts = {}) DaruLite::Core::Merge.join(self, other_df, opts) end |
#merge(other_df) ⇒ DaruLite::DataFrame
Merge vectors from two DataFrames. In case of name collision, the vectors names are changed to x_1, x_2 .…
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
# File 'lib/daru_lite/data_frame/joinable.rb', line 33 def merge(other_df) unless nrows == other_df.nrows raise ArgumentError, "Number of rows must be equal in this: #{nrows} and other: #{other_df.nrows}" end new_fields = (@vectors.to_a + other_df.vectors.to_a) new_fields = ArrayHelper.recode_repeated(new_fields) DataFrame.new({}, order: new_fields).tap do |df_new| (0...nrows).each do |i| df_new.add_row row[i].to_a + other_df.row[i].to_a end df_new.index = @index if @index == other_df.index df_new.update end end |
#one_to_many(parent_fields, pattern) ⇒ Object
Creates a new dataset for one to many relations on a dataset, based on pattern of field names.
for example, you have a survey for number of children with this structure:
id, name, child_name_1, child_age_1, child_name_2, child_age_2
with
ds.one_to_many([:id], "child_%v_%n"
the field of first parameters will be copied verbatim to new dataset, and fields which responds to second pattern will be added one case for each different %n.
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
# File 'lib/daru_lite/data_frame/joinable.rb', line 113 def one_to_many(parent_fields, pattern) vars, numbers = one_to_many_components(pattern) DataFrame.new([], order: [*parent_fields, '_col_id', *vars]).tap do |ds| each_row do |row| verbatim = parent_fields.map { |f| [f, row[f]] }.to_h numbers.each do |n| generated = one_to_many_row row, n, vars, pattern next if generated.values.all?(&:nil?) ds.add_row(verbatim.merge(generated).merge('_col_id' => n)) end end ds.update end end |
#union(other_df) ⇒ Object
Concatenates another DataFrame as #concat. Additionally it tries to preserve the index. If the indices contain common elements, #union will overwrite the according rows in the first dataframe.
20 21 22 23 24 25 26 27 |
# File 'lib/daru_lite/data_frame/joinable.rb', line 20 def union(other_df) index = (@index.to_a + other_df.index.to_a).uniq df = row[*(@index.to_a - other_df.index.to_a)] df = df.concat(other_df) df.index = DaruLite::Index.new(index) df end |