Module: DaruLite::DataFrame::Filterable
- Included in:
- DaruLite::DataFrame
- Defined in:
- lib/daru_lite/data_frame/filterable.rb
Instance Method Summary collapse
-
#filter(axis = :vector) ⇒ Object
Retain vectors or rows if the block returns a truthy value.
-
#filter_rows ⇒ Object
Iterates over each row and retains it in a new DataFrame if the block returns true for that row.
-
#filter_vector(vec) ⇒ Object
creates a new vector with the data of a given field which the block returns true.
-
#filter_vectors(&block) ⇒ Object
Iterates over each vector and retains it in a new DataFrame if the block returns true for that vector.
- #keep_row_if ⇒ Object
- #keep_vector_if ⇒ Object
-
#reject_values(*values) ⇒ DaruLite::DataFrame
Returns a dataframe in which rows with any of the mentioned values are ignored.
-
#uniq(*vtrs) ⇒ Object
Return unique rows by vector specified or all vectors.
-
#where(bool_array) ⇒ Object
Query a DataFrame by passing a DaruLite::Core::Query::BoolArray object.
Instance Method Details
#filter(axis = :vector) ⇒ Object
Retain vectors or rows if the block returns a truthy value.
Description
For filtering out certain rows/vectors based on their values, use the #filter method. By default it iterates over vectors and keeps those vectors for which the block returns true. It accepts an optional axis argument which lets you specify whether you want to iterate over vectors or rows.
Arguments
-
axis- The axis to map over. Can be :vector (or :column) or :row.
Default to :vector.
Usage
# Filter vectors
df.filter do |vector|
vector.type == :numeric and vector.median < 50
end
# Filter rows
df.filter(:row) do |row|
row[:a] + row[:d] < 100
end
71 72 73 |
# File 'lib/daru_lite/data_frame/filterable.rb', line 71 def filter(axis = :vector, &) dispatch_to_axis_pl(axis, :filter, &) end |
#filter_rows ⇒ Object
Iterates over each row and retains it in a new DataFrame if the block returns true for that row.
122 123 124 125 126 127 128 |
# File 'lib/daru_lite/data_frame/filterable.rb', line 122 def filter_rows return to_enum(:filter_rows) unless block_given? keep_rows = @index.map { |index| yield access_row(index) } where keep_rows end |
#filter_vector(vec) ⇒ Object
creates a new vector with the data of a given field which the block returns true
116 117 118 |
# File 'lib/daru_lite/data_frame/filterable.rb', line 116 def filter_vector(vec, &) DaruLite::Vector.new(each_row.select(&).map { |row| row[vec] }) end |
#filter_vectors(&block) ⇒ Object
Iterates over each vector and retains it in a new DataFrame if the block returns true for that vector.
132 133 134 135 136 |
# File 'lib/daru_lite/data_frame/filterable.rb', line 132 def filter_vectors(&block) return to_enum(:filter_vectors) unless block dup.tap { |df| df.keep_vector_if(&block) } end |
#keep_row_if ⇒ Object
103 104 105 106 107 |
# File 'lib/daru_lite/data_frame/filterable.rb', line 103 def keep_row_if @index.size.times .reject { |position| yield(row_at(position)) } .reverse_each { |position| delete_at_position(position) } end |
#keep_vector_if ⇒ Object
109 110 111 112 113 |
# File 'lib/daru_lite/data_frame/filterable.rb', line 109 def keep_vector_if @vectors.each do |vector| delete_vector(vector) unless yield(@data[@vectors[vector]], vector) end end |
#reject_values(*values) ⇒ DaruLite::DataFrame
Returns a dataframe in which rows with any of the mentioned values are ignored.
91 92 93 94 95 96 97 98 99 100 101 |
# File 'lib/daru_lite/data_frame/filterable.rb', line 91 def reject_values(*values) positions = size.times.to_a - @data.flat_map { |vec| vec.positions(*values) } # Handle the case when positions size is 1 and #row_at wouldn't return a df if positions.size == 1 pos = positions.first row_at(pos..pos) else row_at(*positions) end end |
#uniq(*vtrs) ⇒ Object
Return unique rows by vector specified or all vectors
36 37 38 39 40 41 |
# File 'lib/daru_lite/data_frame/filterable.rb', line 36 def uniq(*vtrs) vecs = vtrs.empty? ? vectors.to_a : Array(vtrs) grouped = group_by(vecs) indexes = grouped.groups.values.map { |v| v[0] }.sort row[*indexes] end |