Class: DaruLite::DataFrame
- Extended by:
- Gem::Deprecate
- Defined in:
- lib/daru_lite/dataframe.rb,
lib/daru_lite/extensions/which_dsl.rb
Overview
rubocop:disable Metrics/ClassLength
Defined Under Namespace
Modules: SetCategoricalIndexStrategy, SetMultiIndexStrategy, SetSingleIndexStrategy
Instance Attribute Summary collapse
-
#data ⇒ Object
readonly
TOREMOVE.
-
#index ⇒ Object
The index of the rows of the DataFrame.
-
#name ⇒ Object
readonly
The name of the DataFrame.
-
#size ⇒ Object
readonly
The number of rows present in the DataFrame.
-
#vectors ⇒ Object
The vectors (columns) index of the DataFrame.
Class Method Summary collapse
- ._load(data) ⇒ Object
-
.crosstab_by_assignation(rows, columns, values) ⇒ Object
Generates a new dataset, using three vectors - Rows - Columns - Values.
-
.from_activerecord(relation, *fields) ⇒ Object
Read a dataframe from AR::Relation.
-
.from_csv(path, opts = {}, &block) ⇒ Object
Load data from a CSV file.
-
.from_excel(path, opts = {}, &block) ⇒ Object
Read data from an Excel file into a DataFrame.
-
.from_plaintext(path, fields) ⇒ Object
Read the database from a plaintext file.
-
.from_sql(dbh, query) ⇒ Object
Read a database query and returns a Dataset.
-
.rows(source, opts = {}) ⇒ Object
Create DataFrame by specifying rows as an Array of Arrays or Array of DaruLite::Vector objects.
Instance Method Summary collapse
- #==(other) ⇒ Object
-
#[](*names) ⇒ Object
Access row or vector.
-
#[]=(*args) ⇒ Object
Insert a new row/vector of the specified name or modify a previous row.
- #_dump(_depth) ⇒ Object
-
#access_row_tuples_by_indexs(*indexes) ⇒ Array
Returns array of row tuples at given index(s).
-
#add_level_to_vectors(top_level_label) ⇒ Object
Converts the vectors to a DaruLite::MultiIndex.
- #add_row(row, index = nil) ⇒ Object
- #add_vector(n, vector) ⇒ Object
- #add_vectors_by_split(name, join = '-', sep = DaruLite::SPLIT_TOKEN) ⇒ Object
- #add_vectors_by_split_recode(nm, join = '-', sep = DaruLite::SPLIT_TOKEN) ⇒ Object
-
#aggregate(options = {}, multi_index_level = -1)) ⇒ DaruLite::DataFrame
Function to use for aggregating the data.
-
#all?(axis = :vector, &block) ⇒ Boolean
Works like Array#all?.
-
#any?(axis = :vector, &block) ⇒ Boolean
Works like Array#any?.
- #apply_method(method, keys: nil, by_position: true) ⇒ Object (also: #apply_method_on_sub_df)
-
#at(*positions) ⇒ DaruLite::Vector, DaruLite::DataFrame
Retrive vectors by positions.
-
#bootstrap(n = nil) ⇒ DaruLite::DataFrame
Creates a DataFrame with the random data, of n size.
-
#clone(*vectors_to_clone) ⇒ Object
Returns a ‘view’ of the DataFrame, i.e the object ID’s of vectors are preserved.
-
#clone_only_valid ⇒ Object
Returns a ‘shallow’ copy of DataFrame if missing data is not present, or a full copy of only valid data if missing data is present.
-
#clone_structure ⇒ Object
Only clone the structure of the DataFrame.
-
#collect(axis = :vector, &block) ⇒ Object
Iterate over a row or vector and return results in a DaruLite::Vector.
-
#collect_matrix ⇒ ::Matrix
Generate a matrix, based on vector names of the DataFrame.
- #collect_row_with_index(&block) ⇒ Object
-
#collect_rows(&block) ⇒ Object
Retrieves a DaruLite::Vector, based on the result of calculation performed on each row.
- #collect_vector_with_index(&block) ⇒ Object
-
#collect_vectors(&block) ⇒ Object
Retrives a DaruLite::Vector, based on the result of calculation performed on each vector.
-
#compute(text, &block) ⇒ Object
Returns a vector, based on a string with a calculation based on vector.
-
#concat(other_df) ⇒ Object
Concatenate another DataFrame along corresponding columns.
-
#create_sql(table, charset = 'UTF8') ⇒ Object
Create a sql, basen on a given Dataset.
-
#delete_row(index) ⇒ Object
Delete a row.
-
#delete_vector(vector) ⇒ Object
Delete a vector.
-
#delete_vectors(*vectors) ⇒ Object
Deletes a list of vectors.
-
#dup(vectors_to_dup = nil) ⇒ Object
Duplicate the DataFrame entirely.
-
#dup_only_valid(vecs = nil) ⇒ Object
Creates a new duplicate dataframe containing only rows without a single missing value.
-
#each(axis = :vector, &block) ⇒ Object
Iterate over each row or vector of the DataFrame.
-
#each_index(&block) ⇒ Object
Iterate over each index of the DataFrame.
-
#each_row ⇒ Object
Iterate over each row.
- #each_row_with_index ⇒ Object
-
#each_vector(&block) ⇒ Object
(also: #each_column)
Iterate over each vector.
-
#each_vector_with_index ⇒ Object
(also: #each_column_with_index)
Iterate over each vector alongwith the name of the vector.
-
#filter(axis = :vector, &block) ⇒ Object
Retain vectors or rows if the block returns a truthy value.
-
#filter_rows ⇒ Object
Iterates over each row and retains it in a new DataFrame if the block returns true for that row.
-
#filter_vector(vec, &block) ⇒ Object
creates a new vector with the data of a given field which the block returns true.
-
#filter_vectors(&block) ⇒ Object
Iterates over each vector and retains it in a new DataFrame if the block returns true for that vector.
-
#get_sub_dataframe(keys, by_position: true) ⇒ DaruLite::Dataframe
Extract a dataframe given row indexes or positions.
- #get_vector_anyways(v) ⇒ Object
-
#group_by(*vectors) ⇒ Object
Group elements by vector to perform operations on them.
- #group_by_and_aggregate(*group_by_keys, **aggregation_map) ⇒ Object
- #has_missing_data? ⇒ Boolean (also: #flawed?)
-
#has_vector?(vector) ⇒ Boolean
Check if a vector is present.
-
#head(quantity = 10) ⇒ Object
(also: #first)
The first ten elements of the DataFrame.
-
#include_values?(*values) ⇒ true, false
Check if any of given values occur in the data frame.
-
#initialize(source = {}, opts = {}) ⇒ DataFrame
constructor
DataFrame basically consists of an Array of Vector objects.
- #insert_vector(n, name, source) ⇒ Object
-
#inspect(spacing = DaruLite.spacing, threshold = DaruLite.max_rows) ⇒ Object
Pretty print in a nice table format for the command line (irb/pry/iruby).
- #interact_code(vector_names, full) ⇒ Object
-
#join(other_df, opts = {}) ⇒ DaruLite::DataFrame
Join 2 DataFrames with SQL style joins.
- #keep_row_if ⇒ Object
- #keep_vector_if ⇒ Object
-
#map(axis = :vector, &block) ⇒ Object
Map over each vector or row of the data frame according to the argument specified.
-
#map!(axis = :vector, &block) ⇒ Object
Destructive map.
-
#map_rows(&block) ⇒ Object
Map each row.
- #map_rows! ⇒ Object
- #map_rows_with_index(&block) ⇒ Object
-
#map_vectors(&block) ⇒ Object
Map each vector and return an Array.
-
#map_vectors! ⇒ Object
Destructive form of #map_vectors.
-
#map_vectors_with_index(&block) ⇒ Object
Map vectors alongwith the index.
-
#merge(other_df) ⇒ DaruLite::DataFrame
Merge vectors from two DataFrames.
- #method_missing(name, *args, &block) ⇒ Object
-
#missing_values_rows(missing_values = [nil]) ⇒ Object
(also: #vector_missing_values)
Return a vector with the number of missing values in each row.
-
#ncols ⇒ Object
The number of vectors.
-
#nest(*tree_keys, &block) ⇒ Object
Return a nested hash using vector names as keys and an array constructed of hashes with other values.
-
#nrows ⇒ Object
The number of rows.
- #numeric_vector_names ⇒ Object
-
#numeric_vectors ⇒ Object
Return the indexes of all the numeric vectors.
-
#one_to_many(parent_fields, pattern) ⇒ Object
Creates a new dataset for one to many relations on a dataset, based on pattern of field names.
-
#only_numerics(opts = {}) ⇒ Object
Return a DataFrame of only the numerical Vectors.
-
#order=(order_array) ⇒ Object
Reorder the vectors in a dataframe.
-
#pivot_table(opts = {}) ⇒ Object
Pivots a data frame on specified vectors and applies an aggregate function to quickly generate a summary.
-
#recode(axis = :vector, &block) ⇒ Object
Maps over the DataFrame and returns a DataFrame.
- #recode_rows ⇒ Object
- #recode_vectors ⇒ Object
-
#reindex(new_index) ⇒ Object
Change the index of the DataFrame and preserve the labels of the previous indexing.
- #reindex_vectors(new_vectors) ⇒ Object
-
#reject_values(*values) ⇒ DaruLite::DataFrame
Returns a dataframe in which rows with any of the mentioned values are ignored.
-
#rename(new_name) ⇒ Object
(also: #name=)
Rename the DataFrame.
-
#rename_vectors(name_map) ⇒ Object
Renames the vectors.
-
#rename_vectors!(name_map) ⇒ Object
Renames the vectors and returns itself.
-
#replace_values(old_values, new_value) ⇒ DaruLite::DataFrame
Replace specified values with given value.
- #reset_index ⇒ Object
- #respond_to_missing?(name, include_private = false) ⇒ Boolean
- #rolling_fillna(direction = :forward) ⇒ Object
-
#rolling_fillna!(direction = :forward) ⇒ Object
Rolling fillna replace all Float::NAN and NIL values with the preceeding or following value.
-
#rotate_vectors(count = -1)) ⇒ Object
Return the dataframe with rotate vectors positions, the vector at position count is now the first vector of the dataframe.
-
#row ⇒ Object
Access a row or set/create a row.
-
#row_at(*positions) ⇒ DaruLite::Vector, DaruLite::DataFrame
Retrive rows by positions.
-
#save(filename) ⇒ Object
Use marshalling to save dataframe to a file.
-
#set_at(positions, vector) ⇒ Object
Set vectors by positions.
-
#set_index(new_index_col, keep: false, categorical: false) ⇒ Object
Set a particular column as the new DF.
-
#set_row_at(positions, vector) ⇒ Object
Set rows by positions.
-
#shape ⇒ Object
Return the number of rows and columns of the DataFrame in an Array.
-
#sort(vector_order, opts = {}) ⇒ Object
Non-destructive version of #sort!.
-
#sort!(vector_order, opts = {}) ⇒ Object
Sorts a dataframe (ascending/descending) in the given pripority sequence of vectors, with or without a block.
-
#split_by_category(cat_name) ⇒ Array
Split the dataframe into many dataframes based on category vector.
-
#summary ⇒ String
Generate a summary of this DataFrame based on individual vectors in the DataFrame.
-
#tail(quantity = 10) ⇒ Object
(also: #last)
The last ten elements of the DataFrame.
-
#to_a ⇒ Object
Converts the DataFrame into an array of hashes where key is vector name and value is the corresponding element.
-
#to_category(*names) ⇒ DaruLite::DataFrame
Converts the specified non category type vectors to category type vectors.
-
#to_df ⇒ self
Returns the dataframe.
-
#to_h ⇒ Object
Converts DataFrame to a hash (explicit) with keys as vector names and values as the corresponding vectors.
-
#to_html(threshold = DaruLite.max_rows) ⇒ Object
Convert to html for IRuby.
- #to_html_tbody(threshold = DaruLite.max_rows) ⇒ Object
- #to_html_thead ⇒ Object
-
#to_json(no_index = true) ⇒ Object
Convert to json.
-
#to_matrix ⇒ Object
Convert all vectors of type :numeric into a Matrix.
- #to_s ⇒ Object
-
#transpose ⇒ Object
Transpose a DataFrame, tranposing elements and row, column indexing.
-
#union(other_df) ⇒ Object
Concatenates another DataFrame as #concat.
-
#uniq(*vtrs) ⇒ Object
Return unique rows by vector specified or all vectors.
-
#update ⇒ Object
Method for updating the metadata (i.e. missing value positions) of the after assingment/deletion etc.
-
#vector_by_calculation(&block) ⇒ Object
DSL for yielding each row and returning a DaruLite::Vector based on the value each run of the block returns.
- #vector_count_characters(vecs = nil) ⇒ Object
-
#vector_mean(max_missing = 0) ⇒ Object
Calculate mean of the rows of the dataframe.
-
#vector_sum(*args) ⇒ Object
Sum all numeric/specified vectors in the DataFrame.
-
#verify(*tests) ⇒ Object
Test each row with one or more tests.
-
#where(bool_array) ⇒ Object
Query a DataFrame by passing a DaruLite::Core::Query::BoolArray object.
- #which(&block) ⇒ Object
-
#write_csv(filename, opts = {}) ⇒ Object
Write this DataFrame to a CSV file.
-
#write_excel(filename, opts = {}) ⇒ Object
Write this dataframe to an Excel Spreadsheet.
-
#write_sql(dbh, table) ⇒ Object
Insert each case of the Dataset on the selected table.
Methods included from Maths::Statistics::DataFrame
#acf, #correlation, #count, #covariance, #cumsum, #describe, #ema, #max, #mean, #median, #min, #mode, #percent_change, #product, #range, #rolling_count, #rolling_max, #rolling_mean, #rolling_median, #rolling_min, #rolling_std, #rolling_variance, #standardize, #std, #sum, #variance_sample
Methods included from Maths::Arithmetic::DataFrame
#%, #*, #**, #+, #-, #/, #exp, #round, #sqrt
Constructor Details
#initialize(source = {}, opts = {}) ⇒ DataFrame
DataFrame basically consists of an Array of Vector objects. These objects are indexed by row and column by vectors and index Index objects.
Arguments
-
source - Source from the DataFrame is to be initialized. Can be a Hash
of names and vectors (array or DaruLite::Vector), an array of arrays or array of DaruLite::Vectors.
Options
:order
- An Array/DaruLite::Index/DaruLite::MultiIndex containing the order in which Vectors should appear in the DataFrame.
:index
- An Array/DaruLite::Index/DaruLite::MultiIndex containing the order in which rows of the DataFrame will be named.
:name
- A name for the DataFrame.
:clone
- Specify as true or false. When set to false, and Vector objects are passed for the source, the Vector objects will not duplicated when creating the DataFrame. Will have no effect if Array is passed in the source, or if the passed DaruLite::Vectors have different indexes. Default to true.
Usage
df = DaruLite::DataFrame.new
# =>
# <DaruLite::DataFrame(0x0)>
# Creates an empty DataFrame with no rows or columns.
df = DaruLite::DataFrame.new({}, order: [:a, :b])
#<DaruLite::DataFrame(0x2)>
a b
# Creates a DataFrame with no rows and columns :a and :b
df = DaruLite::DataFrame.new({a: [1,2,3,4], b: [6,7,8,9]}, order: [:b, :a],
index: [:a, :b, :c, :d], name: :spider_man)
# =>
# <DaruLite::DataFrame:80766980 @name = spider_man @size = 4>
# b a
# a 6 1
# b 7 2
# c 8 3
# d 9 4
df = DaruLite::DataFrame.new([[1,2,3,4],[6,7,8,9]], name: :bat_man)
# =>
# #<DaruLite::DataFrame: bat_man (4x2)>
# 0 1
# 0 1 6
# 1 2 7
# 2 3 8
# 3 4 9
# Dataframe having Index name
df = DaruLite::DataFrame.new({a: [1,2,3,4], b: [6,7,8,9]}, order: [:b, :a],
index: DaruLite::Index.new([:a, :b, :c, :d], name: 'idx_name'),
name: :spider_man)
# =>
# <DaruLite::DataFrame:80766980 @name = spider_man @size = 4>
# idx_name b a
# a 6 1
# b 7 2
# c 8 3
# d 9 4
idx = DaruLite::Index.new [100, 99, 101, 1, 2], name: "s1"
=> #<DaruLite::Index(5): s1 {100, 99, 101, 1, 2}>
df = DaruLite::DataFrame.new({b: [11,12,13,14,15], a: [1,2,3,4,5],
c: [11,22,33,44,55]},
order: [:a, :b, :c],
index: idx)
# =>
#<DaruLite::DataFrame(5x3)>
# s1 a b c
# 100 1 11 11
# 99 2 12 22
# 101 3 13 33
# 1 4 14 44
# 2 5 15 55
299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 |
# File 'lib/daru_lite/dataframe.rb', line 299 def initialize(source = {}, opts = {}) vectors = opts[:order] index = opts[:index] # FIXME: just keyword arges after Ruby 2.1 @data = [] @name = opts[:name] case source when [], {} create_empty_vectors(vectors, index) when Array initialize_from_array source, vectors, index, opts when Hash initialize_from_hash source, vectors, index, opts end set_size validate update end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(name, *args, &block) ⇒ Object
2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 |
# File 'lib/daru_lite/dataframe.rb', line 2251 def method_missing(name, *args, &block) if /(.+)=/.match?(name) name = name[/(.+)=/].delete('=') name = name.to_sym unless has_vector?(name) insert_or_modify_vector [name], args[0] elsif has_vector?(name) self[name] elsif has_vector?(name.to_s) self[name.to_s] else super end end |
Instance Attribute Details
#data ⇒ Object (readonly)
TOREMOVE
199 200 201 |
# File 'lib/daru_lite/dataframe.rb', line 199 def data @data end |
#index ⇒ Object
The index of the rows of the DataFrame
202 203 204 |
# File 'lib/daru_lite/dataframe.rb', line 202 def index @index end |
#name ⇒ Object (readonly)
The name of the DataFrame
205 206 207 |
# File 'lib/daru_lite/dataframe.rb', line 205 def name @name end |
#size ⇒ Object (readonly)
The number of rows present in the DataFrame
208 209 210 |
# File 'lib/daru_lite/dataframe.rb', line 208 def size @size end |
#vectors ⇒ Object
The vectors (columns) index of the DataFrame
197 198 199 |
# File 'lib/daru_lite/dataframe.rb', line 197 def vectors @vectors end |
Class Method Details
._load(data) ⇒ Object
2184 2185 2186 2187 2188 2189 2190 |
# File 'lib/daru_lite/dataframe.rb', line 2184 def self._load(data) h = Marshal.load data DaruLite::DataFrame.new(h[:data], index: h[:index], order: h[:order], name: h[:name]) end |
.crosstab_by_assignation(rows, columns, values) ⇒ Object
Generates a new dataset, using three vectors
-
Rows
-
Columns
-
Values
For example, you have these values
x y v
a a 0
a b 1
b a 1
b b 0
You obtain
id a b
a 0 1
b 1 0
Useful to process outputs from databases
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
# File 'lib/daru_lite/dataframe.rb', line 155 def crosstab_by_assignation(rows, columns, values) raise 'Three vectors should be equal size' if rows.size != columns.size || rows.size != values.size data = Hash.new do |h, col| h[col] = rows.factors.map { |r| [r, nil] }.to_h end columns.zip(rows, values).each { |c, r, v| data[c][r] = v } # FIXME: in fact, WITHOUT this line you'll obtain more "right" # data: with vectors having "rows" as an index... data = data.transform_values(&:values) data[:_id] = rows.factors DataFrame.new(data) end |
.from_activerecord(relation, *fields) ⇒ Object
99 100 101 |
# File 'lib/daru_lite/dataframe.rb', line 99 def from_activerecord(relation, *fields) DaruLite::IO.from_activerecord relation, *fields end |
.from_csv(path, opts = {}, &block) ⇒ Object
Load data from a CSV file. Specify an optional block to grab the CSV object and pre-condition it (for example use the ‘convert` or `header_convert` methods).
Arguments
-
path - Local path / Remote URL of the file to load specified as a String.
Options
Accepts the same options as the DaruLite::DataFrame constructor and CSV.open() and uses those to eventually construct the resulting DataFrame.
Verbose Description
You can specify all the options to the ‘.from_csv` function that you do to the Ruby `CSV.read()` function, since this is what is used internally.
For example, if the columns in your CSV file are separated by something other that commas, you can use the ‘:col_sep` option. If you want to convert numeric values to numbers and not keep them as strings, you can use the `:converters` option and set it to `:numeric`.
The ‘.from_csv` function uses the following defaults for reading CSV files (that are passed into the `CSV.read()` function):
{
:col_sep => ',',
:converters => :numeric
}
46 47 48 |
# File 'lib/daru_lite/dataframe.rb', line 46 def from_csv(path, opts = {}, &block) DaruLite::IO.from_csv path, opts, &block end |
.from_excel(path, opts = {}, &block) ⇒ Object
Read data from an Excel file into a DataFrame.
Arguments
-
path - Path of the file to be read.
Options
*:worksheet_id - ID of the worksheet that is to be read.
59 60 61 |
# File 'lib/daru_lite/dataframe.rb', line 59 def from_excel(path, opts = {}, &block) DaruLite::IO.from_excel path, opts, &block end |
.from_plaintext(path, fields) ⇒ Object
Read the database from a plaintext file. For this method to work, the data should be present in a plain text file in columns. See spec/fixtures/bank2.dat for an example.
Arguments
-
path - Path of the file to be read.
-
fields - Vector names of the resulting database.
Usage
df = DaruLite::DataFrame.from_plaintext 'spec/fixtures/bank2.dat', [:v1,:v2,:v3,:v4,:v5,:v6]
115 116 117 |
# File 'lib/daru_lite/dataframe.rb', line 115 def from_plaintext(path, fields) DaruLite::IO.from_plaintext path, fields end |
.from_sql(dbh, query) ⇒ Object
79 80 81 |
# File 'lib/daru_lite/dataframe.rb', line 79 def from_sql(dbh, query) DaruLite::IO.from_sql dbh, query end |
.rows(source, opts = {}) ⇒ Object
Create DataFrame by specifying rows as an Array of Arrays or Array of DaruLite::Vector objects.
121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
# File 'lib/daru_lite/dataframe.rb', line 121 def rows(source, opts = {}) raise SizeError, 'All vectors must have same length' \ unless source.all? { |v| v.size == source.first.size } opts[:order] ||= guess_order(source) if ArrayHelper.array_of?(source, Array) || source.empty? DataFrame.new(source.transpose, opts) elsif ArrayHelper.array_of?(source, Vector) from_vector_rows(source, opts) else raise ArgumentError, "Can't create DataFrame from #{source}" end end |
Instance Method Details
#==(other) ⇒ Object
2226 2227 2228 2229 2230 2231 2232 |
# File 'lib/daru_lite/dataframe.rb', line 2226 def ==(other) self.class == other.class && @size == other.size && @index == other.index && @vectors == other.vectors && @vectors.to_a.all? { |v| self[v] == other[v] } end |
#[](*names) ⇒ Object
Access row or vector. Specify name of row/vector followed by axis(:row, :vector). Defaults to :vector. Use of this method is not recommended for accessing rows. Use df.row for accessing row with index ‘:a’.
322 323 324 325 |
# File 'lib/daru_lite/dataframe.rb', line 322 def [](*names) axis = extract_axis(names, :vector) dispatch_to_axis axis, :access, *names end |
#[]=(*args) ⇒ Object
Insert a new row/vector of the specified name or modify a previous row. Instead of using this method directly, use df.row = [1,2,3] to set/create a row ‘:a’ to [1,2,3], or df.vector = [1,2,3] for vectors.
In case a DaruLite::Vector is specified after the equality the sign, the indexes of the vector will be matched against the row/vector indexes of the DataFrame before an insertion is performed. Unmatched indexes will be set to nil.
464 465 466 467 468 469 470 |
# File 'lib/daru_lite/dataframe.rb', line 464 def []=(*args) vector = args.pop axis = extract_axis(args) names = args dispatch_to_axis axis, :insert_or_modify, names, vector end |
#_dump(_depth) ⇒ Object
2175 2176 2177 2178 2179 2180 2181 2182 |
# File 'lib/daru_lite/dataframe.rb', line 2175 def _dump(_depth) Marshal.dump( data: @data, index: @index.to_a, order: @vectors.to_a, name: @name ) end |
#access_row_tuples_by_indexs(*indexes) ⇒ Array
Returns array of row tuples at given index(s)
2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 |
# File 'lib/daru_lite/dataframe.rb', line 2340 def access_row_tuples_by_indexs(*indexes) return get_sub_dataframe(indexes, by_position: false).map_rows(&:to_a) if @index.is_a?(DaruLite::MultiIndex) positions = @index.pos(*indexes) if positions.is_a? Numeric row = get_rows_for([positions]) row.first.is_a?(Array) ? row : [row] else new_rows = get_rows_for(indexes, by_position: false) indexes.map { |index| new_rows.map { |r| r[index] } } end end |
#add_level_to_vectors(top_level_label) ⇒ Object
Converts the vectors to a DaruLite::MultiIndex. The argument passed is used as the MultiIndex’s top level
1697 1698 1699 1700 |
# File 'lib/daru_lite/dataframe.rb', line 1697 def add_level_to_vectors(top_level_label) tuples = vectors.map { |label| [top_level_label, *label] } self.vectors = DaruLite::MultiIndex.from_tuples(tuples) end |
#add_row(row, index = nil) ⇒ Object
472 473 474 |
# File 'lib/daru_lite/dataframe.rb', line 472 def add_row(row, index = nil) self.row[*(index || @size)] = row end |
#add_vector(n, vector) ⇒ Object
476 477 478 |
# File 'lib/daru_lite/dataframe.rb', line 476 def add_vector(n, vector) self[n] = vector end |
#add_vectors_by_split(name, join = '-', sep = DaruLite::SPLIT_TOKEN) ⇒ Object
1271 1272 1273 1274 1275 |
# File 'lib/daru_lite/dataframe.rb', line 1271 def add_vectors_by_split(name, join = '-', sep = DaruLite::SPLIT_TOKEN) self[name] .split_by_separator(sep) .each { |k, v| self[:"#{name}#{join}#{k}"] = v } end |
#add_vectors_by_split_recode(nm, join = '-', sep = DaruLite::SPLIT_TOKEN) ⇒ Object
2001 2002 2003 2004 2005 2006 2007 2008 |
# File 'lib/daru_lite/dataframe.rb', line 2001 def add_vectors_by_split_recode(nm, join = '-', sep = DaruLite::SPLIT_TOKEN) self[nm] .split_by_separator(sep) .each_with_index do |(k, v), i| v.rename "#{nm}:#{k}" self[:"#{nm}#{join}#{i + 1}"] = v end end |
#aggregate(options = {}, multi_index_level = -1)) ⇒ DaruLite::DataFrame
Function to use for aggregating the data.
Note: ‘GroupBy` class `aggregate` method uses this `aggregate` method internally.
2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 |
# File 'lib/daru_lite/dataframe.rb', line 2401 def aggregate( = {}, multi_index_level = -1) if block_given? positions_tuples, new_index = yield(@index) # NOTE: use of yield is private for now else positions_tuples, new_index = group_index_for_aggregation(@index, multi_index_level) end colmn_value = aggregate_by_positions_tuples(, positions_tuples) DaruLite::DataFrame.new(colmn_value, index: new_index, order: .keys) end |
#all?(axis = :vector, &block) ⇒ Boolean
Works like Array#all?
1328 1329 1330 1331 1332 1333 1334 1335 1336 |
# File 'lib/daru_lite/dataframe.rb', line 1328 def all?(axis = :vector, &block) if %i[vector column].include?(axis) @data.all?(&block) elsif axis == :row each_row.all?(&block) else raise ArgumentError, "Unidentified axis #{axis}" end end |
#any?(axis = :vector, &block) ⇒ Boolean
Works like Array#any?.
1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 |
# File 'lib/daru_lite/dataframe.rb', line 1306 def any?(axis = :vector, &block) if %i[vector column].include?(axis) @data.any?(&block) elsif axis == :row each_row do |row| return true if yield(row) end false else raise ArgumentError, "Unidentified axis #{axis}" end end |
#apply_method(method, keys: nil, by_position: true) ⇒ Object Also known as: apply_method_on_sub_df
957 958 959 960 961 962 963 964 965 966 |
# File 'lib/daru_lite/dataframe.rb', line 957 def apply_method(method, keys: nil, by_position: true) df = keys ? get_sub_dataframe(keys, by_position: by_position) : self case method when Symbol then df.send(method) when Proc then method.call(df) when Array then method.map(&:to_proc).map { |proc| proc.call(df) } # works with Array of both Symbol and/or Proc else raise end end |
#at(*positions) ⇒ DaruLite::Vector, DaruLite::DataFrame
Retrive vectors by positions
402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 |
# File 'lib/daru_lite/dataframe.rb', line 402 def at(*positions) if AXES.include? positions.last axis = positions.pop return row_at(*positions) if axis == :row end original_positions = positions positions = coerce_positions(*positions, ncols) validate_positions(*positions, ncols) if positions.is_a? Integer @data[positions].dup else DaruLite::DataFrame.new positions.map { |pos| @data[pos].dup }, index: @index, order: @vectors.at(*original_positions), name: @name end end |
#bootstrap(n = nil) ⇒ DaruLite::DataFrame
Creates a DataFrame with the random data, of n size. If n not given, uses original number of rows.
1052 1053 1054 1055 1056 1057 1058 1059 1060 |
# File 'lib/daru_lite/dataframe.rb', line 1052 def bootstrap(n = nil) n ||= nrows DaruLite::DataFrame.new({}, order: @vectors).tap do |df_boot| n.times do df_boot.add_row(row[rand(n)]) end df_boot.update end end |
#clone(*vectors_to_clone) ⇒ Object
Returns a ‘view’ of the DataFrame, i.e the object ID’s of vectors are preserved.
Arguments
vectors_to_clone
- Names of vectors to clone. Optional. Will return a view of the whole data frame otherwise.
542 543 544 545 546 547 548 |
# File 'lib/daru_lite/dataframe.rb', line 542 def clone(*vectors_to_clone) vectors_to_clone.flatten! if ArrayHelper.array_of?(vectors_to_clone, Array) vectors_to_clone = @vectors.to_a if vectors_to_clone.empty? h = vectors_to_clone.map { |vec| [vec, self[vec]] }.to_h DaruLite::DataFrame.new(h, clone: false, order: vectors_to_clone, name: @name) end |
#clone_only_valid ⇒ Object
Returns a ‘shallow’ copy of DataFrame if missing data is not present, or a full copy of only valid data if missing data is present.
552 553 554 555 556 557 558 |
# File 'lib/daru_lite/dataframe.rb', line 552 def clone_only_valid if include_values?(*DaruLite::MISSING_VALUES) reject_values(*DaruLite::MISSING_VALUES) else clone end end |
#clone_structure ⇒ Object
Only clone the structure of the DataFrame.
531 532 533 |
# File 'lib/daru_lite/dataframe.rb', line 531 def clone_structure DaruLite::DataFrame.new([], order: @vectors.dup, index: @index.dup, name: @name) end |
#collect(axis = :vector, &block) ⇒ Object
Iterate over a row or vector and return results in a DaruLite::Vector. Specify axis with :vector or :row. Default to :vector.
Description
The #collect iterator works similar to #map, the only difference being that it returns a DaruLite::Vector comprising of the results of each block run. The resultant Vector has the same index as that of the axis over which collect has iterated. It also accepts the optional axis argument.
Arguments
-
axis
- The axis to iterate over. Can be :vector (or :column)
or :row. Default to :vector.
796 797 798 |
# File 'lib/daru_lite/dataframe.rb', line 796 def collect(axis = :vector, &block) dispatch_to_axis_pl axis, :collect, &block end |
#collect_matrix ⇒ ::Matrix
Generate a matrix, based on vector names of the DataFrame.
:nocov: FIXME: Even not trying to cover this: I can’t get, how it is expected to work.… – zverok
1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 |
# File 'lib/daru_lite/dataframe.rb', line 1003 def collect_matrix return to_enum(:collect_matrix) unless block_given? vecs = vectors.to_a rows = vecs.collect do |row| vecs.collect do |col| yield row, col end end Matrix.rows(rows) end |
#collect_row_with_index(&block) ⇒ Object
977 978 979 980 981 |
# File 'lib/daru_lite/dataframe.rb', line 977 def collect_row_with_index(&block) return to_enum(:collect_row_with_index) unless block DaruLite::Vector.new(each_row_with_index.map(&block), index: @index) end |
#collect_rows(&block) ⇒ Object
Retrieves a DaruLite::Vector, based on the result of calculation performed on each row.
971 972 973 974 975 |
# File 'lib/daru_lite/dataframe.rb', line 971 def collect_rows(&block) return to_enum(:collect_rows) unless block DaruLite::Vector.new(each_row.map(&block), index: @index) end |
#collect_vector_with_index(&block) ⇒ Object
991 992 993 994 995 |
# File 'lib/daru_lite/dataframe.rb', line 991 def collect_vector_with_index(&block) return to_enum(:collect_vector_with_index) unless block DaruLite::Vector.new(each_vector_with_index.map(&block), index: @vectors) end |
#collect_vectors(&block) ⇒ Object
Retrives a DaruLite::Vector, based on the result of calculation performed on each vector.
985 986 987 988 989 |
# File 'lib/daru_lite/dataframe.rb', line 985 def collect_vectors(&block) return to_enum(:collect_vectors) unless block DaruLite::Vector.new(each_vector.map(&block), index: @vectors) end |
#compute(text, &block) ⇒ Object
Returns a vector, based on a string with a calculation based on vector.
The calculation will be eval’ed, so you can put any variable or expression valid on ruby.
For example:
a = DaruLite::Vector.new [1,2]
b = DaruLite::Vector.new [3,4]
ds = DaruLite::DataFrame.new({:a => a,:b => b})
ds.compute("a+b")
=> Vector [4,6]
1195 1196 1197 1198 1199 |
# File 'lib/daru_lite/dataframe.rb', line 1195 def compute(text, &block) return instance_eval(&block) if block instance_eval(text) end |
#concat(other_df) ⇒ Object
Concatenate another DataFrame along corresponding columns. If columns do not exist in both dataframes, they are filled with nils
1481 1482 1483 1484 1485 1486 1487 1488 1489 |
# File 'lib/daru_lite/dataframe.rb', line 1481 def concat(other_df) vectors = (@vectors.to_a + other_df.vectors.to_a).uniq data = vectors.map do |v| get_vector_anyways(v).dup.concat(other_df.get_vector_anyways(v)) end DaruLite::DataFrame.new(data, order: vectors) end |
#create_sql(table, charset = 'UTF8') ⇒ Object
Create a sql, basen on a given Dataset
Arguments
-
table - String specifying name of the table that will created in SQL.
-
charset - Character set. Default is “UTF8”.
2026 2027 2028 2029 2030 2031 2032 2033 2034 |
# File 'lib/daru_lite/dataframe.rb', line 2026 def create_sql(table, charset = 'UTF8') sql = "CREATE TABLE #{table} (" fields = vectors.to_a.collect do |f| v = self[f] "#{f} #{v.db_type}" end sql + fields.join(",\n ") + ") CHARACTER SET=#{charset};" end |
#delete_row(index) ⇒ Object
Delete a row
1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 |
# File 'lib/daru_lite/dataframe.rb', line 1035 def delete_row(index) idx = named_index_for index raise IndexError, "Index #{index} does not exist." unless @index.include? idx @index = DaruLite::Index.new(@index.to_a - [idx]) each_vector do |vector| vector.delete_at idx end set_size end |
#delete_vector(vector) ⇒ Object
Delete a vector
1018 1019 1020 1021 1022 1023 1024 1025 |
# File 'lib/daru_lite/dataframe.rb', line 1018 def delete_vector(vector) raise IndexError, "Vector #{vector} does not exist." unless @vectors.include?(vector) @data.delete_at @vectors[vector] @vectors = DaruLite::Index.new @vectors.to_a - [vector] self end |
#delete_vectors(*vectors) ⇒ Object
Deletes a list of vectors
1028 1029 1030 1031 1032 |
# File 'lib/daru_lite/dataframe.rb', line 1028 def delete_vectors(*vectors) Array(vectors).each { |vec| delete_vector vec } self end |
#dup(vectors_to_dup = nil) ⇒ Object
Duplicate the DataFrame entirely.
Arguments
-
vectors_to_dup
- An Array specifying the names of Vectors to
be duplicated. Will duplicate the entire DataFrame if not specified.
521 522 523 524 525 526 527 528 |
# File 'lib/daru_lite/dataframe.rb', line 521 def dup(vectors_to_dup = nil) vectors_to_dup ||= @vectors.to_a src = vectors_to_dup.map { |vec| @data[@vectors.pos(vec)].dup } new_order = DaruLite::Index.new(vectors_to_dup) DaruLite::DataFrame.new src, order: new_order, index: @index.dup, name: @name, clone: true end |
#dup_only_valid(vecs = nil) ⇒ Object
Creates a new duplicate dataframe containing only rows without a single missing value.
562 563 564 565 566 567 568 569 |
# File 'lib/daru_lite/dataframe.rb', line 562 def dup_only_valid(vecs = nil) rows_with_nil = @data.map { |vec| vec.indexes(*DaruLite::MISSING_VALUES) } .inject(&:concat) .uniq row_indexes = @index.to_a (vecs.nil? ? self : dup(vecs)).row[*(row_indexes - rows_with_nil)] end |
#each(axis = :vector, &block) ⇒ Object
Iterate over each row or vector of the DataFrame. Specify axis by passing :vector or :row as the argument. Default to :vector.
Description
‘#each` works exactly like Array#each. The default mode for `each` is to iterate over the columns of the DataFrame. To iterate over rows you must pass the axis, i.e `:row` as an argument.
Arguments
-
axis
- The axis to iterate over. Can be :vector (or :column)
or :row. Default to :vector.
777 778 779 |
# File 'lib/daru_lite/dataframe.rb', line 777 def each(axis = :vector, &block) dispatch_to_axis axis, :each, &block end |
#each_index(&block) ⇒ Object
Iterate over each index of the DataFrame.
711 712 713 714 715 716 717 |
# File 'lib/daru_lite/dataframe.rb', line 711 def each_index(&block) return to_enum(:each_index) unless block @index.each(&block) self end |
#each_row ⇒ Object
Iterate over each row
744 745 746 747 748 749 750 751 752 |
# File 'lib/daru_lite/dataframe.rb', line 744 def each_row return to_enum(:each_row) unless block_given? @index.size.times do |pos| yield row_at(pos) end self end |
#each_row_with_index ⇒ Object
754 755 756 757 758 759 760 761 762 |
# File 'lib/daru_lite/dataframe.rb', line 754 def each_row_with_index return to_enum(:each_row_with_index) unless block_given? @index.each do |index| yield access_row(index), index end self end |
#each_vector(&block) ⇒ Object Also known as: each_column
Iterate over each vector
720 721 722 723 724 725 726 |
# File 'lib/daru_lite/dataframe.rb', line 720 def each_vector(&block) return to_enum(:each_vector) unless block @data.each(&block) self end |
#each_vector_with_index ⇒ Object Also known as: each_column_with_index
Iterate over each vector alongwith the name of the vector
731 732 733 734 735 736 737 738 739 |
# File 'lib/daru_lite/dataframe.rb', line 731 def each_vector_with_index return to_enum(:each_vector_with_index) unless block_given? @vectors.each do |vector| yield @data[@vectors[vector]], vector end self end |
#filter(axis = :vector, &block) ⇒ Object
Retain vectors or rows if the block returns a truthy value.
Description
For filtering out certain rows/vectors based on their values, use the #filter method. By default it iterates over vectors and keeps those vectors for which the block returns true. It accepts an optional axis argument which lets you specify whether you want to iterate over vectors or rows.
Arguments
-
axis
- The axis to map over. Can be :vector (or :column) or :row.
Default to :vector.
Usage
# Filter vectors
df.filter do |vector|
vector.type == :numeric and vector.median < 50
end
# Filter rows
df.filter(:row) do |row|
row[:a] + row[:d] < 100
end
885 886 887 |
# File 'lib/daru_lite/dataframe.rb', line 885 def filter(axis = :vector, &block) dispatch_to_axis_pl axis, :filter, &block end |
#filter_rows ⇒ Object
Iterates over each row and retains it in a new DataFrame if the block returns true for that row.
1081 1082 1083 1084 1085 1086 1087 |
# File 'lib/daru_lite/dataframe.rb', line 1081 def filter_rows return to_enum(:filter_rows) unless block_given? keep_rows = @index.map { |index| yield access_row(index) } where keep_rows end |
#filter_vector(vec, &block) ⇒ Object
creates a new vector with the data of a given field which the block returns true
1075 1076 1077 |
# File 'lib/daru_lite/dataframe.rb', line 1075 def filter_vector(vec, &block) DaruLite::Vector.new(each_row.select(&block).map { |row| row[vec] }) end |
#filter_vectors(&block) ⇒ Object
Iterates over each vector and retains it in a new DataFrame if the block returns true for that vector.
1091 1092 1093 1094 1095 |
# File 'lib/daru_lite/dataframe.rb', line 1091 def filter_vectors(&block) return to_enum(:filter_vectors) unless block dup.tap { |df| df.keep_vector_if(&block) } end |
#get_sub_dataframe(keys, by_position: true) ⇒ DaruLite::Dataframe
Extract a dataframe given row indexes or positions
504 505 506 507 508 509 510 511 512 513 |
# File 'lib/daru_lite/dataframe.rb', line 504 def get_sub_dataframe(keys, by_position: true) return DaruLite::DataFrame.new({}) if keys == [] keys = @index.pos(*keys) unless by_position sub_df = row_at(*keys) sub_df = sub_df.to_df.transpose if sub_df.is_a?(DaruLite::Vector) sub_df end |
#get_vector_anyways(v) ⇒ Object
1475 1476 1477 |
# File 'lib/daru_lite/dataframe.rb', line 1475 def get_vector_anyways(v) @vectors.include?(v) ? self[v].to_a : Array.new(size) end |
#group_by(*vectors) ⇒ Object
Group elements by vector to perform operations on them. Returns a DaruLite::Core::GroupBy object.See the DaruLite::Core::GroupBy docs for a detailed list of possible operations.
Arguments
-
vectors - An Array contatining names of vectors to group by.
Usage
df = DaruLite::DataFrame.new({
a: %w{foo bar foo bar foo bar foo foo},
b: %w{one one two three two two one three},
c: [1 ,2 ,3 ,1 ,3 ,6 ,3 ,8],
d: [11 ,22 ,33 ,44 ,55 ,66 ,77 ,88]
})
df.group_by([:a,:b,:c]).groups
#=> {["bar", "one", 2]=>[1],
# ["bar", "three", 1]=>[3],
# ["bar", "two", 6]=>[5],
# ["foo", "one", 1]=>[0],
# ["foo", "one", 3]=>[6],
# ["foo", "three", 8]=>[7],
# ["foo", "two", 3]=>[2, 4]}
1453 1454 1455 1456 1457 1458 1459 1460 1461 |
# File 'lib/daru_lite/dataframe.rb', line 1453 def group_by(*vectors) vectors.flatten! missing = vectors - @vectors.to_a raise(ArgumentError, "Vector(s) missing: #{missing.join(', ')}") unless missing.empty? vectors = [@vectors.first] if vectors.empty? DaruLite::Core::GroupBy.new(self, vectors) end |
#group_by_and_aggregate(*group_by_keys, **aggregation_map) ⇒ Object
2413 2414 2415 |
# File 'lib/daru_lite/dataframe.rb', line 2413 def group_by_and_aggregate(*group_by_keys, **aggregation_map) group_by(*group_by_keys).aggregate(aggregation_map) end |
#has_missing_data? ⇒ Boolean Also known as: flawed?
1218 1219 1220 |
# File 'lib/daru_lite/dataframe.rb', line 1218 def has_missing_data? @data.any? { |vec| vec.include_values?(*DaruLite::MISSING_VALUES) } end |
#has_vector?(vector) ⇒ Boolean
Check if a vector is present
1293 1294 1295 |
# File 'lib/daru_lite/dataframe.rb', line 1293 def has_vector?(vector) @vectors.include? vector end |
#head(quantity = 10) ⇒ Object Also known as: first
The first ten elements of the DataFrame
1341 1342 1343 |
# File 'lib/daru_lite/dataframe.rb', line 1341 def head(quantity = 10) row.at 0..(quantity - 1) end |
#include_values?(*values) ⇒ true, false
Check if any of given values occur in the data frame
1237 1238 1239 |
# File 'lib/daru_lite/dataframe.rb', line 1237 def include_values?(*values) @data.any? { |vec| vec.include_values?(*values) } end |
#insert_vector(n, name, source) ⇒ Object
480 481 482 483 484 485 486 487 488 489 490 |
# File 'lib/daru_lite/dataframe.rb', line 480 def insert_vector(n, name, source) raise ArgumentError unless source.is_a? Array vector = DaruLite::Vector.new(source, index: @index, name: @name) @data << vector @vectors = @vectors.add name ordr = @vectors.dup.to_a elmnt = ordr.pop ordr.insert n, elmnt self.order = ordr end |
#inspect(spacing = DaruLite.spacing, threshold = DaruLite.max_rows) ⇒ Object
Pretty print in a nice table format for the command line (irb/pry/iruby)
2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 |
# File 'lib/daru_lite/dataframe.rb', line 2204 def inspect(spacing = DaruLite.spacing, threshold = DaruLite.max_rows) name_part = @name ? ": #{@name} " : '' spacing = [ headers.to_a.map { |header| header.try(:length) || header.to_s.length }.max, spacing ].max "#<#{self.class}#{name_part}(#{nrows}x#{ncols})>#{$INPUT_RECORD_SEPARATOR}" + Formatters::Table.format( each_row.lazy, row_headers: row_headers, headers: headers, threshold: threshold, spacing: spacing ) end |
#interact_code(vector_names, full) ⇒ Object
2269 2270 2271 2272 2273 2274 2275 2276 2277 |
# File 'lib/daru_lite/dataframe.rb', line 2269 def interact_code(vector_names, full) dfs = vector_names.zip(full).map do |vec_name, f| self[vec_name].contrast_code(full: f).each.to_a end all_vectors = recursive_product(dfs) DaruLite::DataFrame.new all_vectors, order: all_vectors.map(&:name) end |
#join(other_df, opts = {}) ⇒ DaruLite::DataFrame
Join 2 DataFrames with SQL style joins. Currently supports inner, left outer, right outer and full outer joins.
1949 1950 1951 |
# File 'lib/daru_lite/dataframe.rb', line 1949 def join(other_df, opts = {}) DaruLite::Core::Merge.join(self, other_df, opts) end |
#keep_row_if ⇒ Object
1062 1063 1064 1065 1066 |
# File 'lib/daru_lite/dataframe.rb', line 1062 def keep_row_if @index .reject { |idx| yield access_row(idx) } .each { |idx| delete_row idx } end |
#keep_vector_if ⇒ Object
1068 1069 1070 1071 1072 |
# File 'lib/daru_lite/dataframe.rb', line 1068 def keep_vector_if @vectors.each do |vector| delete_vector(vector) unless yield(@data[@vectors[vector]], vector) end end |
#map(axis = :vector, &block) ⇒ Object
Map over each vector or row of the data frame according to the argument specified. Will return an Array of the resulting elements. To map over each row/vector and get a DataFrame, see #recode.
Description
The #map iterator works like Array#map. The value returned by each run of the block is added to an Array and the Array is returned. This method also accepts an axis argument, like #each. The default is :vector.
Arguments
-
axis
- The axis to map over. Can be :vector (or :column) or :row.
Default to :vector.
816 817 818 |
# File 'lib/daru_lite/dataframe.rb', line 816 def map(axis = :vector, &block) dispatch_to_axis_pl axis, :map, &block end |
#map!(axis = :vector, &block) ⇒ Object
Destructive map. Modifies the DataFrame. Each run of the block must return a DaruLite::Vector. You can specify the axis to map over as the argument. Default to :vector.
Arguments
-
axis
- The axis to map over. Can be :vector (or :column) or :row.
Default to :vector.
828 829 830 831 832 833 834 |
# File 'lib/daru_lite/dataframe.rb', line 828 def map!(axis = :vector, &block) if %i[vector column].include?(axis) map_vectors!(&block) elsif axis == :row map_rows!(&block) end end |
#map_rows(&block) ⇒ Object
Map each row
935 936 937 938 939 |
# File 'lib/daru_lite/dataframe.rb', line 935 def map_rows(&block) return to_enum(:map_rows) unless block each_row.map(&block) end |
#map_rows! ⇒ Object
947 948 949 950 951 952 953 954 955 |
# File 'lib/daru_lite/dataframe.rb', line 947 def map_rows! return to_enum(:map_rows!) unless block_given? index.dup.each do |i| row[i] = should_be_vector!(yield(row[i])) end self end |
#map_rows_with_index(&block) ⇒ Object
941 942 943 944 945 |
# File 'lib/daru_lite/dataframe.rb', line 941 def map_rows_with_index(&block) return to_enum(:map_rows_with_index) unless block each_row_with_index.map(&block) end |
#map_vectors(&block) ⇒ Object
Map each vector and return an Array.
910 911 912 913 914 |
# File 'lib/daru_lite/dataframe.rb', line 910 def map_vectors(&block) return to_enum(:map_vectors) unless block @data.map(&block) end |
#map_vectors! ⇒ Object
Destructive form of #map_vectors
917 918 919 920 921 922 923 924 925 |
# File 'lib/daru_lite/dataframe.rb', line 917 def map_vectors! return to_enum(:map_vectors!) unless block_given? vectors.dup.each do |n| self[n] = should_be_vector!(yield(self[n])) end self end |
#map_vectors_with_index(&block) ⇒ Object
Map vectors alongwith the index.
928 929 930 931 932 |
# File 'lib/daru_lite/dataframe.rb', line 928 def map_vectors_with_index(&block) return to_enum(:map_vectors_with_index) unless block each_vector_with_index.map(&block) end |
#merge(other_df) ⇒ DaruLite::DataFrame
Merge vectors from two DataFrames. In case of name collision, the vectors names are changed to x_1, x_2 .…
1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 |
# File 'lib/daru_lite/dataframe.rb', line 1904 def merge(other_df) unless nrows == other_df.nrows raise ArgumentError, "Number of rows must be equal in this: #{nrows} and other: #{other_df.nrows}" end new_fields = (@vectors.to_a + other_df.vectors.to_a) new_fields = ArrayHelper.recode_repeated(new_fields) DataFrame.new({}, order: new_fields).tap do |df_new| (0...nrows).each do |i| df_new.add_row row[i].to_a + other_df.row[i].to_a end df_new.index = @index if @index == other_df.index df_new.update end end |
#missing_values_rows(missing_values = [nil]) ⇒ Object Also known as: vector_missing_values
Return a vector with the number of missing values in each row.
Arguments
-
missing_values
- An Array of the values that should be
treated as ‘missing’. The default missing value is nil.
1207 1208 1209 1210 1211 1212 1213 |
# File 'lib/daru_lite/dataframe.rb', line 1207 def missing_values_rows(missing_values = [nil]) number_of_missing = each_row.map do |row| row.indexes(*missing_values).size end DaruLite::Vector.new number_of_missing, index: @index, name: "#{@name}_missing_rows" end |
#ncols ⇒ Object
The number of vectors
1288 1289 1290 |
# File 'lib/daru_lite/dataframe.rb', line 1288 def ncols @vectors.size end |
#nest(*tree_keys, &block) ⇒ Object
Return a nested hash using vector names as keys and an array constructed of hashes with other values. If block provided, is used to provide the values, with parameters row
of dataset, current
last hash on hierarchy and name
of the key to include
1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 |
# File 'lib/daru_lite/dataframe.rb', line 1245 def nest(*tree_keys, &block) tree_keys = tree_keys[0] if tree_keys[0].is_a? Array each_row.with_object({}) do |row, current| # Create tree *keys, last = tree_keys current = keys.inject(current) { |c, f| c[row[f]] ||= {} } name = row[last] if block current[name] = yield(row, current, name) else current[name] ||= [] current[name].push(row.to_h.delete_if { |key, _value| tree_keys.include? key }) end end end |
#nrows ⇒ Object
The number of rows
1283 1284 1285 |
# File 'lib/daru_lite/dataframe.rb', line 1283 def nrows @index.size end |
#numeric_vector_names ⇒ Object
1711 1712 1713 |
# File 'lib/daru_lite/dataframe.rb', line 1711 def numeric_vector_names @vectors.select { |v| self[v].numeric? } end |
#numeric_vectors ⇒ Object
Return the indexes of all the numeric vectors. Will include vectors with nils alongwith numbers.
1704 1705 1706 1707 1708 1709 |
# File 'lib/daru_lite/dataframe.rb', line 1704 def numeric_vectors # FIXME: Why _with_index ?.. each_vector_with_index .select { |vec, _i| vec.numeric? } .map(&:last) end |
#one_to_many(parent_fields, pattern) ⇒ Object
Creates a new dataset for one to many relations on a dataset, based on pattern of field names.
for example, you have a survey for number of children with this structure:
id, name, child_name_1, child_age_1, child_name_2, child_age_2
with
ds.one_to_many([:id], "child_%v_%n"
the field of first parameters will be copied verbatim to new dataset, and fields which responds to second pattern will be added one case for each different %n.
1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 |
# File 'lib/daru_lite/dataframe.rb', line 1984 def one_to_many(parent_fields, pattern) vars, numbers = one_to_many_components(pattern) DataFrame.new([], order: [*parent_fields, '_col_id', *vars]).tap do |ds| each_row do |row| verbatim = parent_fields.map { |f| [f, row[f]] }.to_h numbers.each do |n| generated = one_to_many_row row, n, vars, pattern next if generated.values.all?(&:nil?) ds.add_row(verbatim.merge(generated).merge('_col_id' => n)) end end ds.update end end |
#only_numerics(opts = {}) ⇒ Object
Return a DataFrame of only the numerical Vectors. If clone: false is specified as option, only a view of the Vectors will be returned. Defaults to clone: true.
1718 1719 1720 1721 1722 1723 1724 |
# File 'lib/daru_lite/dataframe.rb', line 1718 def only_numerics(opts = {}) cln = opts[:clone] != false arry = numeric_vectors.map { |v| self[v] } order = Index.new(numeric_vectors) DaruLite::DataFrame.new(arry, clone: cln, order: order, index: @index) end |
#order=(order_array) ⇒ Object
Reorder the vectors in a dataframe
1153 1154 1155 1156 1157 |
# File 'lib/daru_lite/dataframe.rb', line 1153 def order=(order_array) raise ArgumentError, 'Invalid order' unless vectors.to_a.tally == order_array.tally initialize(to_h, order: order_array) end |
#pivot_table(opts = {}) ⇒ Object
Pivots a data frame on specified vectors and applies an aggregate function to quickly generate a summary.
Options
:index
- Keys to group by on the pivot table row index. Pass vector names contained in an Array.
:vectors
- Keys to group by on the pivot table column index. Pass vector names contained in an Array.
:agg
- Function to aggregate the grouped values. Default to :mean. Can use any of the statistics functions applicable on Vectors that can be found in the DaruLite::Statistics::Vector module.
:values
- Columns to aggregate. Will consider all numeric columns not specified in :index or :vectors. Optional.
Usage
df = DaruLite::DataFrame.new({
a: ['foo' , 'foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'bar'],
b: ['one' , 'one', 'one', 'two', 'two', 'one', 'one', 'two', 'two'],
c: ['small','large','large','small','small','large','small','large','small'],
d: [1,2,2,3,3,4,5,6,7],
e: [2,4,4,6,6,8,10,12,14]
})
df.pivot_table(index: [:a], vectors: [:b], agg: :sum, values: :e)
#=>
# #<DaruLite::DataFrame:88342020 @name = 08cdaf4e-b154-4186-9084-e76dd191b2c9 @size = 2>
# [:e, :one] [:e, :two]
# [:bar] 18 26
# [:foo] 10 12
1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 |
# File 'lib/daru_lite/dataframe.rb', line 1883 def pivot_table(opts = {}) raise ArgumentError, 'Specify grouping index' if Array(opts[:index]).empty? index = opts[:index] vectors = opts[:vectors] || [] aggregate_function = opts[:agg] || :mean values = prepare_pivot_values index, vectors, opts raise IndexError, 'No numeric vectors to aggregate' if values.empty? grouped = group_by(index) return grouped.send(aggregate_function) if vectors.empty? super_hash = make_pivot_hash grouped, vectors, values, aggregate_function pivot_dataframe super_hash end |
#recode(axis = :vector, &block) ⇒ Object
Maps over the DataFrame and returns a DataFrame. Each run of the block must return a DaruLite::Vector object. You can specify the axis to map over. Default to :vector.
Description
Recode works similarly to #map, but an important difference between the two is that recode returns a modified DaruLite::DataFrame instead of an Array. For this reason, #recode expects that every run of the block to return a DaruLite::Vector.
Just like map and each, recode also accepts an optional axis argument.
Arguments
-
axis
- The axis to map over. Can be :vector (or :column) or :row.
Default to :vector.
853 854 855 |
# File 'lib/daru_lite/dataframe.rb', line 853 def recode(axis = :vector, &block) dispatch_to_axis_pl axis, :recode, &block end |
#recode_rows ⇒ Object
899 900 901 902 903 904 905 906 907 |
# File 'lib/daru_lite/dataframe.rb', line 899 def recode_rows block_given? or return to_enum(:recode_rows) dup.tap do |df| df.each_row_with_index do |r, i| df.row[i] = should_be_vector!(yield(r)) end end end |
#recode_vectors ⇒ Object
889 890 891 892 893 894 895 896 897 |
# File 'lib/daru_lite/dataframe.rb', line 889 def recode_vectors block_given? or return to_enum(:recode_vectors) dup.tap do |df| df.each_vector_with_index do |v, i| df[*i] = should_be_vector!(yield(v)) end end end |
#reindex(new_index) ⇒ Object
Change the index of the DataFrame and preserve the labels of the previous indexing. New index can be DaruLite::Index or any of its subclasses.
1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 |
# File 'lib/daru_lite/dataframe.rb', line 1587 def reindex(new_index) unless new_index.is_a?(DaruLite::Index) raise ArgumentError, 'Must pass the new index of type Index or its ' \ "subclasses, not #{new_index.class}" end cl = DaruLite::DataFrame.new({}, order: @vectors, index: new_index, name: @name) new_index.each_with_object(cl) do |idx, memo| memo.row[idx] = @index.include?(idx) ? row[idx] : Array.new(ncols) end end |
#reindex_vectors(new_vectors) ⇒ Object
1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 |
# File 'lib/daru_lite/dataframe.rb', line 1463 def reindex_vectors(new_vectors) unless new_vectors.is_a?(DaruLite::Index) raise ArgumentError, 'Must pass the new index of type Index or its ' \ "subclasses, not #{new_vectors.class}" end cl = DaruLite::DataFrame.new({}, order: new_vectors, index: @index, name: @name) new_vectors.each_with_object(cl) do |vec, memo| memo[vec] = @vectors.include?(vec) ? self[vec] : Array.new(nrows) end end |
#reject_values(*values) ⇒ DaruLite::DataFrame
Returns a dataframe in which rows with any of the mentioned values are ignored.
588 589 590 591 592 593 594 595 596 597 598 |
# File 'lib/daru_lite/dataframe.rb', line 588 def reject_values(*values) positions = size.times.to_a - @data.flat_map { |vec| vec.positions(*values) } # Handle the case when positions size is 1 and #row_at wouldn't return a df if positions.size == 1 pos = positions.first row_at(pos..pos) else row_at(*positions) end end |
#rename(new_name) ⇒ Object Also known as: name=
Rename the DataFrame.
2122 2123 2124 2125 |
# File 'lib/daru_lite/dataframe.rb', line 2122 def rename(new_name) @name = new_name self end |
#rename_vectors(name_map) ⇒ Object
Renames the vectors
Arguments
-
name_map - A hash where the keys are the exising vector names and
the values are the new names. If a vector is renamed to a vector name that is already in use, the existing one is overwritten.
Usage
df = DaruLite::DataFrame.new({ a: [1,2,3,4], b: [:a,:b,:c,:d], c: [11,22,33,44] })
df.rename_vectors :a => :alpha, :c => :gamma
df.vectors.to_a #=> [:alpha, :b, :gamma]
1669 1670 1671 1672 1673 1674 1675 |
# File 'lib/daru_lite/dataframe.rb', line 1669 def rename_vectors(name_map) existing_targets = name_map.reject { |k, v| k == v }.values & vectors.to_a delete_vectors(*existing_targets) new_names = vectors.to_a.map { |v| name_map[v] || v } self.vectors = DaruLite::Index.new new_names end |
#rename_vectors!(name_map) ⇒ Object
Renames the vectors and returns itself
Arguments
-
name_map - A hash where the keys are the exising vector names and
the values are the new names. If a vector is renamed to a vector name that is already in use, the existing one is overwritten.
Usage
df = DaruLite::DataFrame.new({ a: [1,2,3,4], b: [:a,:b,:c,:d], c: [11,22,33,44] })
df.rename_vectors! :a => :alpha, :c => :gamma # df
1690 1691 1692 1693 |
# File 'lib/daru_lite/dataframe.rb', line 1690 def rename_vectors!(name_map) rename_vectors(name_map) self end |
#replace_values(old_values, new_value) ⇒ DaruLite::DataFrame
Replace specified values with given value
622 623 624 625 |
# File 'lib/daru_lite/dataframe.rb', line 622 def replace_values(old_values, new_value) @data.each { |vec| vec.replace_values old_values, new_value } self end |
#reset_index ⇒ Object
1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 |
# File 'lib/daru_lite/dataframe.rb', line 1599 def reset_index index_df = index.to_df names = index.name names = [names] unless names.instance_of?(Array) new_vectors = names + vectors.to_a self.index = index_df.index names.each do |name| self[name] = index_df[name] end self.order = new_vectors self end |
#respond_to_missing?(name, include_private = false) ⇒ Boolean
2265 2266 2267 |
# File 'lib/daru_lite/dataframe.rb', line 2265 def respond_to_missing?(name, include_private = false) name.to_s.end_with?('=') || has_vector?(name) || super end |
#rolling_fillna(direction = :forward) ⇒ Object
667 668 669 |
# File 'lib/daru_lite/dataframe.rb', line 667 def rolling_fillna(direction = :forward) dup.rolling_fillna!(direction) end |
#rolling_fillna!(direction = :forward) ⇒ Object
Rolling fillna replace all Float::NAN and NIL values with the preceeding or following value
662 663 664 665 |
# File 'lib/daru_lite/dataframe.rb', line 662 def rolling_fillna!(direction = :forward) @data.each { |vec| vec.rolling_fillna!(direction) } self end |
#rotate_vectors(count = -1)) ⇒ Object
Return the dataframe with rotate vectors positions, the vector at position count is now the first vector of the dataframe. If only one vector in the dataframe, the dataframe is return without any change.
1176 1177 1178 1179 1180 1181 |
# File 'lib/daru_lite/dataframe.rb', line 1176 def rotate_vectors(count = -1) return self unless vectors.many? self.order = vectors.to_a.rotate(count) self end |
#row ⇒ Object
Access a row or set/create a row. Refer #[] and #[]= docs for details.
Usage
df.row[:a] # access row named ':a'
df.row[:b] = [1,2,3] # set row ':b' to [1,2,3]
497 498 499 |
# File 'lib/daru_lite/dataframe.rb', line 497 def row DaruLite::Accessors::DataFrameByRow.new(self) end |
#row_at(*positions) ⇒ DaruLite::Vector, DaruLite::DataFrame
Retrive rows by positions
340 341 342 343 344 345 346 347 348 349 350 351 352 |
# File 'lib/daru_lite/dataframe.rb', line 340 def row_at(*positions) original_positions = positions positions = coerce_positions(*positions, nrows) validate_positions(*positions, nrows) if positions.is_a? Integer row = get_rows_for([positions]) DaruLite::Vector.new row, index: @vectors else new_rows = get_rows_for(original_positions) DaruLite::DataFrame.new new_rows, index: @index.at(*original_positions), order: @vectors end end |
#save(filename) ⇒ Object
Use marshalling to save dataframe to a file.
2171 2172 2173 |
# File 'lib/daru_lite/dataframe.rb', line 2171 def save(filename) DaruLite::IO.save self, filename end |
#set_at(positions, vector) ⇒ Object
Set vectors by positions
437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 |
# File 'lib/daru_lite/dataframe.rb', line 437 def set_at(positions, vector) if positions.last == :row positions.pop return set_row_at(positions, vector) end validate_positions(*positions, ncols) vector = if vector.is_a? DaruLite::Vector vector.reindex @index else DaruLite::Vector.new vector end raise SizeError, 'Vector length should match index length' if vector.size != @index.size positions.each { |pos| @data[pos] = vector } end |
#set_index(new_index_col, keep: false, categorical: false) ⇒ Object
Set a particular column as the new DF
1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 |
# File 'lib/daru_lite/dataframe.rb', line 1545 def set_index(new_index_col, keep: false, categorical: false) if categorical strategy = SetCategoricalIndexStrategy elsif new_index_col.respond_to?(:to_a) strategy = SetMultiIndexStrategy new_index_col = new_index_col.to_a else strategy = SetSingleIndexStrategy end unless categorical uniq_size = strategy.uniq_size(self, new_index_col) raise ArgumentError, 'All elements in new index must be unique.' if @size != uniq_size end self.index = strategy.new_index(self, new_index_col) strategy.delete_vector(self, new_index_col) unless keep self end |
#set_row_at(positions, vector) ⇒ Object
Set rows by positions
369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 |
# File 'lib/daru_lite/dataframe.rb', line 369 def set_row_at(positions, vector) validate_positions(*positions, nrows) vector = if vector.is_a? DaruLite::Vector vector.reindex @vectors else DaruLite::Vector.new vector end raise SizeError, 'Vector length should match row length' if vector.size != @vectors.size @data.each_with_index do |vec, pos| vec.set_at(positions, vector.at(pos)) end @index = @data[0].index set_size end |
#shape ⇒ Object
Return the number of rows and columns of the DataFrame in an Array.
1278 1279 1280 |
# File 'lib/daru_lite/dataframe.rb', line 1278 def shape [nrows, ncols] end |
#sort(vector_order, opts = {}) ⇒ Object
Non-destructive version of #sort!
1845 1846 1847 |
# File 'lib/daru_lite/dataframe.rb', line 1845 def sort(vector_order, opts = {}) dup.sort! vector_order, opts end |
#sort!(vector_order, opts = {}) ⇒ Object
Sorts a dataframe (ascending/descending) in the given pripority sequence of vectors, with or without a block.
1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 |
# File 'lib/daru_lite/dataframe.rb', line 1821 def sort!(vector_order, opts = {}) raise ArgumentError, 'Required atleast one vector name' if vector_order.empty? # To enable sorting with categorical data, # map categories to integers preserving their order old = convert_categorical_vectors vector_order block = sort_prepare_block vector_order, opts order = @index.size.times.sort(&block) new_index = @index.reorder order # To reverse map mapping of categorical data to integers restore_categorical_vectors old @data.each do |vector| vector.reorder! order end self.index = new_index self end |
#split_by_category(cat_name) ⇒ Array
Split the dataframe into many dataframes based on category vector
2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 |
# File 'lib/daru_lite/dataframe.rb', line 2297 def split_by_category(cat_name) cat_dv = self[cat_name] raise ArgumentError, "#{cat_name} is not a category vector" unless cat_dv.category? cat_dv.categories.map do |cat| where(cat_dv.eq cat) .rename(cat) .delete_vector cat_name end end |
#summary ⇒ String
Generate a summary of this DataFrame based on individual vectors in the DataFrame
1728 1729 1730 1731 1732 1733 1734 1735 1736 |
# File 'lib/daru_lite/dataframe.rb', line 1728 def summary summary = "= #{name}" summary << "\n Number of rows: #{nrows}" @vectors.each do |v| summary << "\n Element:[#{v}]\n" summary << self[v].summary(1) end summary end |
#tail(quantity = 10) ⇒ Object Also known as: last
The last ten elements of the DataFrame
1350 1351 1352 1353 |
# File 'lib/daru_lite/dataframe.rb', line 1350 def tail(quantity = 10) start = [-quantity, -size].max row.at start..-1 end |
#to_a ⇒ Object
Converts the DataFrame into an array of hashes where key is vector name and value is the corresponding element. The 0th index of the array contains the array of hashes while the 1th index contains the indexes of each row of the dataframe. Each element in the index array corresponds to its row in the array of hashes, which has the same index.
2053 2054 2055 |
# File 'lib/daru_lite/dataframe.rb', line 2053 def to_a [each_row.map(&:to_h), @index.to_a] end |
#to_category(*names) ⇒ DaruLite::DataFrame
Converts the specified non category type vectors to category type vectors
2246 2247 2248 2249 |
# File 'lib/daru_lite/dataframe.rb', line 2246 def to_category(*names) names.each { |n| self[n] = self[n].to_category } self end |
#to_df ⇒ self
Returns the dataframe. This can be convenient when the user does not know whether the object is a vector or a dataframe.
2039 2040 2041 |
# File 'lib/daru_lite/dataframe.rb', line 2039 def to_df self end |
#to_h ⇒ Object
Converts DataFrame to a hash (explicit) with keys as vector names and values as the corresponding vectors.
2069 2070 2071 2072 2073 |
# File 'lib/daru_lite/dataframe.rb', line 2069 def to_h @vectors .each_with_index .map { |vec_name, idx| [vec_name, @data[idx]] }.to_h end |
#to_html(threshold = DaruLite.max_rows) ⇒ Object
Convert to html for IRuby.
2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 |
# File 'lib/daru_lite/dataframe.rb', line 2076 def to_html(threshold = DaruLite.max_rows) table_thead = to_html_thead table_tbody = to_html_tbody(threshold) path = if index.is_a?(MultiIndex) File.('iruby/templates/dataframe_mi.html.erb', __dir__) else File.('iruby/templates/dataframe.html.erb', __dir__) end ERB.new(File.read(path).strip).result(binding) end |
#to_html_tbody(threshold = DaruLite.max_rows) ⇒ Object
2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 |
# File 'lib/daru_lite/dataframe.rb', line 2097 def to_html_tbody(threshold = DaruLite.max_rows) threshold ||= @size table_tbody_path = if index.is_a?(MultiIndex) File.('iruby/templates/dataframe_mi_tbody.html.erb', __dir__) else File.('iruby/templates/dataframe_tbody.html.erb', __dir__) end ERB.new(File.read(table_tbody_path).strip).result(binding) end |
#to_html_thead ⇒ Object
2087 2088 2089 2090 2091 2092 2093 2094 2095 |
# File 'lib/daru_lite/dataframe.rb', line 2087 def to_html_thead table_thead_path = if index.is_a?(MultiIndex) File.('iruby/templates/dataframe_mi_thead.html.erb', __dir__) else File.('iruby/templates/dataframe_thead.html.erb', __dir__) end ERB.new(File.read(table_thead_path).strip).result(binding) end |
#to_json(no_index = true) ⇒ Object
Convert to json. If no_index is false then the index will NOT be included in the JSON thus created.
2059 2060 2061 2062 2063 2064 2065 |
# File 'lib/daru_lite/dataframe.rb', line 2059 def to_json(no_index = true) if no_index to_a[0].to_json else to_a.to_json end end |
#to_matrix ⇒ Object
Convert all vectors of type :numeric into a Matrix.
2044 2045 2046 |
# File 'lib/daru_lite/dataframe.rb', line 2044 def to_matrix Matrix.columns each_vector.select(&:numeric?).map(&:to_a) end |
#to_s ⇒ Object
2108 2109 2110 |
# File 'lib/daru_lite/dataframe.rb', line 2108 def to_s "#<#{self.class}#{": #{@name}" if @name}(#{nrows}x#{ncols})>" end |
#transpose ⇒ Object
Transpose a DataFrame, tranposing elements and row, column indexing.
2193 2194 2195 2196 2197 2198 2199 2200 2201 |
# File 'lib/daru_lite/dataframe.rb', line 2193 def transpose DaruLite::DataFrame.new( each_vector.map(&:to_a).transpose, index: @vectors, order: @index, dtype: @dtype, name: @name ) end |
#union(other_df) ⇒ Object
Concatenates another DataFrame as #concat. Additionally it tries to preserve the index. If the indices contain common elements, #union will overwrite the according rows in the first dataframe.
1495 1496 1497 1498 1499 1500 1501 1502 |
# File 'lib/daru_lite/dataframe.rb', line 1495 def union(other_df) index = (@index.to_a + other_df.index.to_a).uniq df = row[*(@index.to_a - other_df.index.to_a)] df = df.concat(other_df) df.index = DaruLite::Index.new(index) df end |
#uniq(*vtrs) ⇒ Object
Return unique rows by vector specified or all vectors
703 704 705 706 707 708 |
# File 'lib/daru_lite/dataframe.rb', line 703 def uniq(*vtrs) vecs = vtrs.empty? ? vectors.to_a : Array(vtrs) grouped = group_by(vecs) indexes = grouped.groups.values.map { |v| v[0] }.sort row[*indexes] end |
#update ⇒ Object
Method for updating the metadata (i.e. missing value positions) of the after assingment/deletion etc. are complete. This is provided so that time is not wasted in creating the metadata for the vector each time assignment/deletion of elements is done. Updating data this way is called lazy loading. To set or unset lazy loading, see the .lazy_update= method.
2117 2118 2119 |
# File 'lib/daru_lite/dataframe.rb', line 2117 def update @data.each(&:update) if DaruLite.lazy_update end |
#vector_by_calculation(&block) ⇒ Object
DSL for yielding each row and returning a DaruLite::Vector based on the value each run of the block returns.
Usage
a1 = DaruLite::Vector.new([1, 2, 3, 4, 5, 6, 7])
a2 = DaruLite::Vector.new([10, 20, 30, 40, 50, 60, 70])
a3 = DaruLite::Vector.new([100, 200, 300, 400, 500, 600, 700])
ds = DaruLite::DataFrame.new({ :a => a1, :b => a2, :c => a3 })
total = ds.vector_by_calculation { a + b + c }
# <DaruLite::Vector:82314050 @name = nil @size = 7 >
# nil
# 0 111
# 1 222
# 2 333
# 3 444
# 4 555
# 5 666
# 6 777
1133 1134 1135 1136 1137 |
# File 'lib/daru_lite/dataframe.rb', line 1133 def vector_by_calculation(&block) a = each_row.map { |r| r.instance_eval(&block) } DaruLite::Vector.new a, index: @index end |
#vector_count_characters(vecs = nil) ⇒ Object
1263 1264 1265 1266 1267 1268 1269 |
# File 'lib/daru_lite/dataframe.rb', line 1263 def vector_count_characters(vecs = nil) vecs ||= @vectors.to_a collect_rows do |row| vecs.sum { |v| row[v].to_s.size } end end |
#vector_mean(max_missing = 0) ⇒ Object
Calculate mean of the rows of the dataframe.
Arguments
-
max_missing
- The maximum number of elements in the row that can be
zero for the mean calculation to happen. Default to 0.
1419 1420 1421 1422 1423 1424 1425 1426 1427 |
# File 'lib/daru_lite/dataframe.rb', line 1419 def vector_mean(max_missing = 0) # FIXME: in vector_sum we preserve created vector dtype, but # here we are not. Is this by design or ...? - zverok, 2016-05-18 mean_vec = DaruLite::Vector.new [0] * @size, index: @index, name: "mean_#{@name}" each_row_with_index.with_object(mean_vec) do |(row, i), memo| memo[i] = row.indexes(*DaruLite::MISSING_VALUES).size > max_missing ? nil : row.mean end end |
#vector_sum(*args) ⇒ Object
Sum all numeric/specified vectors in the DataFrame.
Returns a new vector that’s a containing a sum of all numeric or specified vectors of the DataFrame. By default, if the vector contains a nil, the sum is nil. With :skipnil argument set to true, nil values are assumed to be 0 (zero) and the sum vector is returned.
1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 |
# File 'lib/daru_lite/dataframe.rb', line 1401 def vector_sum(*args) defaults = { vecs: nil, skipnil: false } = args.last.is_a?(::Hash) ? args.pop : {} = defaults.merge() vecs = args[0] || [:vecs] skipnil = args[1] || [:skipnil] vecs ||= numeric_vectors sum = DaruLite::Vector.new [0] * @size, index: @index, name: @name, dtype: @dtype vecs.inject(sum) { |memo, n| self[n].add(memo, skipnil: skipnil) } end |
#verify(*tests) ⇒ Object
Test each row with one or more tests. The function returns an array with all errors.
FIXME: description here is too sparse. As far as I can get, it should tell something about that each test is [descr, fields, block], and that first value may be column name to output. - zverok, 2016-05-18
1105 1106 1107 1108 1109 1110 1111 1112 |
# File 'lib/daru_lite/dataframe.rb', line 1105 def verify(*tests) id = tests.first.is_a?(Symbol) ? tests.shift : @vectors.first each_row_with_index.map do |row, i| tests.reject { |*_, block| block.call(row) } .map { |test| row, test, id, i } end.flatten end |
#where(bool_array) ⇒ Object
Query a DataFrame by passing a DaruLite::Core::Query::BoolArray object.
2222 2223 2224 |
# File 'lib/daru_lite/dataframe.rb', line 2222 def where(bool_array) DaruLite::Core::Query.df_where self, bool_array end |
#which(&block) ⇒ Object
15 16 17 |
# File 'lib/daru_lite/extensions/which_dsl.rb', line 15 def which(&block) WhichQuery.new(self, &block).exec end |
#write_csv(filename, opts = {}) ⇒ Object
Write this DataFrame to a CSV file.
Arguments
-
filename - Path of CSV file where the DataFrame is to be saved.
Options
-
convert_comma - If set to true, will convert any commas in any
of the data to full stops (‘.’). All the options accepted by CSV.read() can also be passed into this function.
2141 2142 2143 |
# File 'lib/daru_lite/dataframe.rb', line 2141 def write_csv(filename, opts = {}) DaruLite::IO.dataframe_write_csv self, filename, opts end |
#write_excel(filename, opts = {}) ⇒ Object
Write this dataframe to an Excel Spreadsheet
Arguments
-
filename - The path of the file where the DataFrame should be written.
2150 2151 2152 |
# File 'lib/daru_lite/dataframe.rb', line 2150 def write_excel(filename, opts = {}) DaruLite::IO.dataframe_write_excel self, filename, opts end |
#write_sql(dbh, table) ⇒ Object
Insert each case of the Dataset on the selected table
Arguments
-
dbh - DBI database connection object.
-
query - Query string.
Usage
ds = DaruLite::DataFrame.new({:id=>DaruLite::Vector.new([1,2,3]), :name=>DaruLite::Vector.new(["a","b","c"])})
dbh = DBI.connect("DBI:Mysql:database:localhost", "user", "password")
ds.write_sql(dbh,"test")
2166 2167 2168 |
# File 'lib/daru_lite/dataframe.rb', line 2166 def write_sql(dbh, table) DaruLite::IO.dataframe_write_sql self, dbh, table end |