Class: Daru::DataFrame
- Extended by:
- Gem::Deprecate
- Includes:
- Maths::Arithmetic::DataFrame, Maths::Statistics::DataFrame, Plotting::DataFrame::NyaplotLibrary
- Defined in:
- lib/daru/dataframe.rb,
lib/daru/monkeys.rb,
lib/daru/extensions/rserve.rb
Overview
rubocop:disable Metrics/ClassLength
Instance Attribute Summary collapse
-
#data ⇒ Object
readonly
TOREMOVE.
-
#index ⇒ Object
The index of the rows of the DataFrame.
-
#name ⇒ Object
readonly
The name of the DataFrame.
-
#size ⇒ Object
readonly
The number of rows present in the DataFrame.
-
#vectors ⇒ Object
The vectors (columns) index of the DataFrame.
Class Method Summary collapse
- ._load(data) ⇒ Object
-
.crosstab_by_assignation(rows, columns, values) ⇒ Object
Generates a new dataset, using three vectors - Rows - Columns - Values.
-
.from_activerecord(relation, *fields) ⇒ Object
Read a dataframe from AR::Relation.
-
.from_csv(path, opts = {}, &block) ⇒ Object
Load data from a CSV file.
-
.from_excel(path, opts = {}, &block) ⇒ Object
Read data from an Excel file into a DataFrame.
-
.from_plaintext(path, fields) ⇒ Object
Read the database from a plaintext file.
-
.from_sql(dbh, query) ⇒ Object
Read a database query and returns a Dataset.
-
.rows(source, opts = {}) ⇒ Object
Create DataFrame by specifying rows as an Array of Arrays or Array of Daru::Vector objects.
Instance Method Summary collapse
- #==(other) ⇒ Object
-
#[](*names) ⇒ Object
Access row or vector.
-
#[]=(*args) ⇒ Object
Insert a new row/vector of the specified name or modify a previous row.
- #_dump(_depth) ⇒ Object
- #add_row(row, index = nil) ⇒ Object
- #add_vector(n, vector) ⇒ Object
- #add_vectors_by_split(name, join = '-', sep = Daru::SPLIT_TOKEN) ⇒ Object
- #add_vectors_by_split_recode(nm, join = '-', sep = Daru::SPLIT_TOKEN) ⇒ Object
-
#all?(axis = :vector, &block) ⇒ Boolean
Works like Array#all?.
-
#any?(axis = :vector, &block) ⇒ Boolean
Works like Array#any?.
-
#at(*positions) ⇒ Daru::Vector, Daru::DataFrame
Retrive vectors by positions.
-
#bootstrap(n = nil) ⇒ Daru::DataFrame
Creates a DataFrame with the random data, of n size.
-
#clone(*vectors_to_clone) ⇒ Object
Returns a ‘view’ of the DataFrame, i.e the object ID’s of vectors are preserved.
-
#clone_only_valid ⇒ Object
Returns a ‘shallow’ copy of DataFrame if missing data is not present, or a full copy of only valid data if missing data is present.
-
#clone_structure ⇒ Object
Only clone the structure of the DataFrame.
-
#collect(axis = :vector, &block) ⇒ Object
Iterate over a row or vector and return results in a Daru::Vector.
-
#collect_matrix ⇒ ::Matrix
Generate a matrix, based on vector names of the DataFrame.
- #collect_row_with_index(&block) ⇒ Object
-
#collect_rows(&block) ⇒ Object
Retrieves a Daru::Vector, based on the result of calculation performed on each row.
- #collect_vector_with_index(&block) ⇒ Object
-
#collect_vectors(&block) ⇒ Object
Retrives a Daru::Vector, based on the result of calculation performed on each vector.
-
#compute(text, &block) ⇒ Object
Returns a vector, based on a string with a calculation based on vector.
-
#concat(other_df) ⇒ Object
Concatenate another DataFrame along corresponding columns.
-
#create_sql(table, charset = 'UTF8') ⇒ Object
Create a sql, basen on a given Dataset.
-
#delete_row(index) ⇒ Object
Delete a row.
-
#delete_vector(vector) ⇒ Object
Delete a vector.
-
#delete_vectors(*vectors) ⇒ Object
Deletes a list of vectors.
-
#dup(vectors_to_dup = nil) ⇒ Object
Duplicate the DataFrame entirely.
-
#dup_only_valid(vecs = nil) ⇒ Object
Creates a new duplicate dataframe containing only rows without a single missing value.
-
#each(axis = :vector, &block) ⇒ Object
Iterate over each row or vector of the DataFrame.
-
#each_index(&block) ⇒ Object
Iterate over each index of the DataFrame.
-
#each_row ⇒ Object
Iterate over each row.
- #each_row_with_index ⇒ Object
-
#each_vector(&block) ⇒ Object
(also: #each_column)
Iterate over each vector.
-
#each_vector_with_index ⇒ Object
(also: #each_column_with_index)
Iterate over each vector alongwith the name of the vector.
-
#filter(axis = :vector, &block) ⇒ Object
Retain vectors or rows if the block returns a truthy value.
-
#filter_rows ⇒ Object
Iterates over each row and retains it in a new DataFrame if the block returns true for that row.
-
#filter_vector(vec, &block) ⇒ Object
creates a new vector with the data of a given field which the block returns true.
-
#filter_vectors(&block) ⇒ Object
Iterates over each vector and retains it in a new DataFrame if the block returns true for that vector.
- #get_vector_anyways(v) ⇒ Object
-
#group_by(*vectors) ⇒ Object
Group elements by vector to perform operations on them.
- #has_missing_data? ⇒ Boolean (also: #flawed?)
-
#has_vector?(vector) ⇒ Boolean
Check if a vector is present.
-
#head(quantity = 10) ⇒ Object
(also: #first)
The first ten elements of the DataFrame.
-
#include_values?(*values) ⇒ true, false
Check if any of given values occur in the data frame.
-
#initialize(source, opts = {}) ⇒ DataFrame
constructor
DataFrame basically consists of an Array of Vector objects.
-
#inspect(spacing = 10, threshold = 15) ⇒ Object
Pretty print in a nice table format for the command line (irb/pry/iruby).
- #interact_code(vector_names, full) ⇒ Object
-
#join(other_df, opts = {}) ⇒ Daru::DataFrame
Join 2 DataFrames with SQL style joins.
- #keep_row_if ⇒ Object
- #keep_vector_if ⇒ Object
-
#map(axis = :vector, &block) ⇒ Object
Map over each vector or row of the data frame according to the argument specified.
-
#map!(axis = :vector, &block) ⇒ Object
Destructive map.
-
#map_rows(&block) ⇒ Object
Map each row.
- #map_rows! ⇒ Object
- #map_rows_with_index(&block) ⇒ Object
-
#map_vectors(&block) ⇒ Object
Map each vector and return an Array.
-
#map_vectors! ⇒ Object
Destructive form of #map_vectors.
-
#map_vectors_with_index(&block) ⇒ Object
Map vectors alongwith the index.
-
#merge(other_df) ⇒ Daru::DataFrame
Merge vectors from two DataFrames.
- #method_missing(name, *args, &block) ⇒ Object
-
#missing_values_rows(missing_values = [nil]) ⇒ Object
(also: #vector_missing_values)
Return a vector with the number of missing values in each row.
-
#ncols ⇒ Object
The number of vectors.
-
#nest(*tree_keys, &_block) ⇒ Object
Return a nested hash using vector names as keys and an array constructed of hashes with other values.
-
#nrows ⇒ Object
The number of rows.
- #numeric_vector_names ⇒ Object
-
#numeric_vectors ⇒ Object
Return the indexes of all the numeric vectors.
-
#one_to_many(parent_fields, pattern) ⇒ Object
Creates a new dataset for one to many relations on a dataset, based on pattern of field names.
-
#only_numerics(opts = {}) ⇒ Object
Return a DataFrame of only the numerical Vectors.
-
#pivot_table(opts = {}) ⇒ Object
Pivots a data frame on specified vectors and applies an aggregate function to quickly generate a summary.
- #plotting_library=(lib) ⇒ Object
-
#recast(opts = {}) ⇒ Object
Change dtypes of vectors by supplying a hash of :vector_name => :new_dtype.
-
#recode(axis = :vector, &block) ⇒ Object
Maps over the DataFrame and returns a DataFrame.
- #recode_rows ⇒ Object
- #recode_vectors ⇒ Object
-
#reindex(new_index) ⇒ Object
Change the index of the DataFrame and preserve the labels of the previous indexing.
- #reindex_vectors(new_vectors) ⇒ Object
-
#reject_values(*values) ⇒ Daru::DataFrame
Returns a dataframe in which rows with any of the mentioned values are ignored.
-
#rename(new_name) ⇒ Object
(also: #name=)
Rename the DataFrame.
-
#rename_vectors(name_map) ⇒ Object
Renames the vectors.
-
#replace_values(old_values, new_value) ⇒ Daru::DataFrame
Replace specified values with given value.
-
#report_building(b) ⇒ Object
:nodoc: #.
- #respond_to_missing?(name, include_private = false) ⇒ Boolean
-
#row ⇒ Object
Access a row or set/create a row.
-
#row_at(*positions) ⇒ Daru::Vector, Daru::DataFrame
Retrive rows by positions.
-
#save(filename) ⇒ Object
Use marshalling to save dataframe to a file.
-
#set_at(positions, vector) ⇒ Object
Set vectors by positions.
-
#set_index(new_index, opts = {}) ⇒ Object
Set a particular column as the new DF.
-
#set_row_at(positions, vector) ⇒ Object
Set rows by positions.
-
#shape ⇒ Object
Return the number of rows and columns of the DataFrame in an Array.
-
#sort(vector_order, opts = {}) ⇒ Object
Non-destructive version of #sort!.
-
#sort!(vector_order, opts = {}) ⇒ Object
Sorts a dataframe (ascending/descending) in the given pripority sequence of vectors, with or without a block.
-
#split_by_category(cat_name) ⇒ Array
Split the dataframe into many dataframes based on category vector.
-
#summary(method = :to_text) ⇒ Object
Generate a summary of this DataFrame with ReportBuilder.
-
#tail(quantity = 10) ⇒ Object
(also: #last)
The last ten elements of the DataFrame.
-
#to_a ⇒ Object
Converts the DataFrame into an array of hashes where key is vector name and value is the corresponding element.
-
#to_category(*names) ⇒ Daru::DataFrame
Converts the specified non category type vectors to category type vectors.
-
#to_df ⇒ self
Returns the dataframe.
-
#to_gsl ⇒ Object
Convert all numeric vectors to GSL::Matrix.
-
#to_h ⇒ Object
Converts DataFrame to a hash (explicit) with keys as vector names and values as the corresponding vectors.
-
#to_hash ⇒ Object
NOTE: This alias will soon be removed.
-
#to_html(threshold = 30) ⇒ Object
Convert to html for IRuby.
-
#to_json(no_index = true) ⇒ Object
Convert to json.
-
#to_matrix ⇒ Object
Convert all vectors of type :numeric into a Matrix.
-
#to_nmatrix ⇒ Object
Convert all vectors of type :numeric and not containing nils into an NMatrix.
-
#to_nyaplotdf ⇒ Object
Return a Nyaplot::DataFrame from the data of this DataFrame.
-
#to_REXP ⇒ Object
rubocop:disable Style/MethodName.
- #to_s ⇒ Object
-
#transpose ⇒ Object
Transpose a DataFrame, tranposing elements and row, column indexing.
-
#update ⇒ Object
Method for updating the metadata (i.e. missing value positions) of the after assingment/deletion etc.
-
#vector_by_calculation(&block) ⇒ Object
DSL for yielding each row and returning a Daru::Vector based on the value each run of the block returns.
- #vector_count_characters(vecs = nil) ⇒ Object
-
#vector_mean(max_missing = 0) ⇒ Object
Calculate mean of the rows of the dataframe.
-
#vector_sum(vecs = nil) ⇒ Object
Returns a vector with sum of all vectors specified in the argument.
-
#verify(*tests) ⇒ Object
Test each row with one or more tests.
-
#where(bool_array) ⇒ Object
Query a DataFrame by passing a Daru::Core::Query::BoolArray object.
-
#write_csv(filename, opts = {}) ⇒ Object
Write this DataFrame to a CSV file.
-
#write_excel(filename, opts = {}) ⇒ Object
Write this dataframe to an Excel Spreadsheet.
-
#write_sql(dbh, table) ⇒ Object
Insert each case of the Dataset on the selected table.
Methods included from Plotting::DataFrame::NyaplotLibrary
Methods included from Maths::Statistics::DataFrame
#acf, #correlation, #count, #covariance, #cumsum, #describe, #ema, #max, #mean, #median, #min, #mode, #percent_change, #product, #range, #rolling_count, #rolling_max, #rolling_mean, #rolling_median, #rolling_min, #rolling_std, #rolling_variance, #standardize, #std, #sum, #variance_sample
Methods included from Maths::Arithmetic::DataFrame
#%, #*, #**, #+, #-, #/, #exp, #round, #sqrt
Constructor Details
#initialize(source, opts = {}) ⇒ DataFrame
DataFrame basically consists of an Array of Vector objects. These objects are indexed by row and column by vectors and index Index objects.
Arguments
-
source - Source from the DataFrame is to be initialized. Can be a Hash
of names and vectors (array or Daru::Vector), an array of arrays or array of Daru::Vectors.
Options
:order
- An Array/Daru::Index/Daru::MultiIndex containing the order in which Vectors should appear in the DataFrame.
:index
- An Array/Daru::Index/Daru::MultiIndex containing the order in which rows of the DataFrame will be named.
:name
- A name for the DataFrame.
:clone
- Specify as true or false. When set to false, and Vector objects are passed for the source, the Vector objects will not duplicated when creating the DataFrame. Will have no effect if Array is passed in the source, or if the passed Daru::Vectors have different indexes. Default to true.
Usage
df = Daru::DataFrame.new({a: [1,2,3,4], b: [6,7,8,9]}, order: [:b, :a],
index: [:a, :b, :c, :d], name: :spider_man)
# =>
# <Daru::DataFrame:80766980 @name = spider_man @size = 4>
# b a
# a 6 1
# b 7 2
# c 8 3
# d 9 4
242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 |
# File 'lib/daru/dataframe.rb', line 242 def initialize source, opts={} # rubocop:disable Metrics/MethodLength vectors, index = opts[:order], opts[:index] # FIXME: just keyword arges after Ruby 2.1 @data = [] @name = opts[:name] case source when ->(s) { s.empty? } @vectors = Index.coerce vectors @index = Index.coerce index create_empty_vectors when Array initialize_from_array source, vectors, index, opts when Hash initialize_from_hash source, vectors, index, opts end set_size validate update self.plotting_library = Daru.plotting_library end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(name, *args, &block) ⇒ Object
1894 1895 1896 1897 1898 1899 1900 1901 1902 |
# File 'lib/daru/dataframe.rb', line 1894 def method_missing(name, *args, &block) if name =~ /(.+)\=/ insert_or_modify_vector [name[/(.+)\=/].delete('=').to_sym], args[0] elsif has_vector? name self[name] else super end end |
Instance Attribute Details
#data ⇒ Object (readonly)
TOREMOVE
195 196 197 |
# File 'lib/daru/dataframe.rb', line 195 def data @data end |
#index ⇒ Object
The index of the rows of the DataFrame
198 199 200 |
# File 'lib/daru/dataframe.rb', line 198 def index @index end |
#name ⇒ Object (readonly)
The name of the DataFrame
201 202 203 |
# File 'lib/daru/dataframe.rb', line 201 def name @name end |
#size ⇒ Object (readonly)
The number of rows present in the DataFrame
204 205 206 |
# File 'lib/daru/dataframe.rb', line 204 def size @size end |
#vectors ⇒ Object
The vectors (columns) index of the DataFrame
193 194 195 |
# File 'lib/daru/dataframe.rb', line 193 def vectors @vectors end |
Class Method Details
._load(data) ⇒ Object
1819 1820 1821 1822 1823 1824 1825 |
# File 'lib/daru/dataframe.rb', line 1819 def self._load data h = Marshal.load data Daru::DataFrame.new(h[:data], index: h[:index], order: h[:order], name: h[:name]) end |
.crosstab_by_assignation(rows, columns, values) ⇒ Object
Generates a new dataset, using three vectors
-
Rows
-
Columns
-
Values
For example, you have these values
x y v
a a 0
a b 1
b a 1
b b 0
You obtain
id a b
a 0 1
b 1 0
Useful to process outputs from databases
151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
# File 'lib/daru/dataframe.rb', line 151 def crosstab_by_assignation rows, columns, values raise 'Three vectors should be equal size' if rows.size != columns.size || rows.size!=values.size data = Hash.new { |h, col| h[col] = rows.factors.map { |r| [r, nil] }.to_h } columns.zip(rows, values).each { |c, r, v| data[c][r] = v } # FIXME: in fact, WITHOUT this line you'll obtain more "right" # data: with vectors having "rows" as an index... data = data.map { |c, r| [c, r.values] }.to_h data[:_id] = rows.factors DataFrame.new(data) end |
.from_activerecord(relation, *fields) ⇒ Object
95 96 97 |
# File 'lib/daru/dataframe.rb', line 95 def from_activerecord relation, *fields Daru::IO.from_activerecord relation, *fields end |
.from_csv(path, opts = {}, &block) ⇒ Object
Load data from a CSV file. Specify an optional block to grab the CSV object and pre-condition it (for example use the ‘convert` or `header_convert` methods).
Arguments
-
path - Path of the file to load specified as a String.
Options
Accepts the same options as the Daru::DataFrame constructor and CSV.open() and uses those to eventually construct the resulting DataFrame.
Verbose Description
You can specify all the options to the ‘.from_csv` function that you do to the Ruby `CSV.read()` function, since this is what is used internally.
For example, if the columns in your CSV file are separated by something other that commas, you can use the ‘:col_sep` option. If you want to convert numeric values to numbers and not keep them as strings, you can use the `:converters` option and set it to `:numeric`.
The ‘.from_csv` function uses the following defaults for reading CSV files (that are passed into the `CSV.read()` function):
{
:col_sep => ',',
:converters => :numeric
}
47 48 49 |
# File 'lib/daru/dataframe.rb', line 47 def from_csv path, opts={}, &block Daru::IO.from_csv path, opts, &block end |
.from_excel(path, opts = {}, &block) ⇒ Object
Read data from an Excel file into a DataFrame.
Arguments
-
path - Path of the file to be read.
Options
*:worksheet_id - ID of the worksheet that is to be read.
60 61 62 |
# File 'lib/daru/dataframe.rb', line 60 def from_excel path, opts={}, &block Daru::IO.from_excel path, opts, &block end |
.from_plaintext(path, fields) ⇒ Object
Read the database from a plaintext file. For this method to work, the data should be present in a plain text file in columns. See spec/fixtures/bank2.dat for an example.
Arguments
-
path - Path of the file to be read.
-
fields - Vector names of the resulting database.
Usage
df = Daru::DataFrame.from_plaintext 'spec/fixtures/bank2.dat', [:v1,:v2,:v3,:v4,:v5,:v6]
111 112 113 |
# File 'lib/daru/dataframe.rb', line 111 def from_plaintext path, fields Daru::IO.from_plaintext path, fields end |
.from_sql(dbh, query) ⇒ Object
75 76 77 |
# File 'lib/daru/dataframe.rb', line 75 def from_sql dbh, query Daru::IO.from_sql dbh, query end |
.rows(source, opts = {}) ⇒ Object
Create DataFrame by specifying rows as an Array of Arrays or Array of Daru::Vector objects.
117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
# File 'lib/daru/dataframe.rb', line 117 def rows source, opts={} raise SizeError, 'All vectors must have same length' \ unless source.all? { |v| v.size == source.first.size } opts[:order] ||= guess_order(source) if ArrayHelper.array_of?(source, Array) DataFrame.new(source.transpose, opts) elsif ArrayHelper.array_of?(source, Vector) from_vector_rows(source, opts) else raise ArgumentError, "Can't create DataFrame from #{source}" end end |
Instance Method Details
#==(other) ⇒ Object
1869 1870 1871 1872 1873 1874 1875 |
# File 'lib/daru/dataframe.rb', line 1869 def == other self.class == other.class && @size == other.size && @index == other.index && @vectors == other.vectors && @vectors.to_a.all? { |v| self[v] == other[v] } end |
#[](*names) ⇒ Object
Access row or vector. Specify name of row/vector followed by axis(:row, :vector). Defaults to :vector. Use of this method is not recommended for accessing rows. Use df.row for accessing row with index ‘:a’.
280 281 282 283 |
# File 'lib/daru/dataframe.rb', line 280 def [](*names) axis = extract_axis(names, :vector) dispatch_to_axis axis, :access, *names end |
#[]=(*args) ⇒ Object
Insert a new row/vector of the specified name or modify a previous row. Instead of using this method directly, use df.row = [1,2,3] to set/create a row ‘:a’ to [1,2,3], or df.vector = [1,2,3] for vectors.
In case a Daru::Vector is specified after the equality the sign, the indexes of the vector will be matched against the row/vector indexes of the DataFrame before an insertion is performed. Unmatched indexes will be set to nil.
424 425 426 427 428 429 430 |
# File 'lib/daru/dataframe.rb', line 424 def []=(*args) vector = args.pop axis = extract_axis(args) names = args dispatch_to_axis axis, :insert_or_modify, names, vector end |
#_dump(_depth) ⇒ Object
1810 1811 1812 1813 1814 1815 1816 1817 |
# File 'lib/daru/dataframe.rb', line 1810 def _dump(_depth) Marshal.dump( data: @data, index: @index.to_a, order: @vectors.to_a, name: @name ) end |
#add_row(row, index = nil) ⇒ Object
432 433 434 |
# File 'lib/daru/dataframe.rb', line 432 def add_row row, index=nil self.row[index || @size] = row end |
#add_vector(n, vector) ⇒ Object
436 437 438 |
# File 'lib/daru/dataframe.rb', line 436 def add_vector n, vector self[n] = vector end |
#add_vectors_by_split(name, join = '-', sep = Daru::SPLIT_TOKEN) ⇒ Object
1064 1065 1066 1067 1068 |
# File 'lib/daru/dataframe.rb', line 1064 def add_vectors_by_split(name,join='-',sep=Daru::SPLIT_TOKEN) self[name] .split_by_separator(sep) .each { |k,v| self["#{name}#{join}#{k}".to_sym] = v } end |
#add_vectors_by_split_recode(nm, join = '-', sep = Daru::SPLIT_TOKEN) ⇒ Object
1638 1639 1640 1641 1642 1643 1644 1645 |
# File 'lib/daru/dataframe.rb', line 1638 def add_vectors_by_split_recode(nm, join='-', sep=Daru::SPLIT_TOKEN) self[nm] .split_by_separator(sep) .each_with_index do |(k, v), i| v.rename "#{nm}:#{k}" self["#{nm}#{join}#{i + 1}".to_sym] = v end end |
#all?(axis = :vector, &block) ⇒ Boolean
Works like Array#all?
1121 1122 1123 1124 1125 1126 1127 1128 1129 |
# File 'lib/daru/dataframe.rb', line 1121 def all? axis=:vector, &block if axis == :vector || axis == :column @data.all?(&block) elsif axis == :row each_row.all?(&block) else raise ArgumentError, "Unidentified axis #{axis}" end end |
#any?(axis = :vector, &block) ⇒ Boolean
Works like Array#any?.
1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 |
# File 'lib/daru/dataframe.rb', line 1099 def any? axis=:vector, &block if axis == :vector || axis == :column @data.any?(&block) elsif axis == :row each_row do |row| return true if yield(row) end return false else raise ArgumentError, "Unidentified axis #{axis}" end end |
#at(*positions) ⇒ Daru::Vector, Daru::DataFrame
Retrive vectors by positions
362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 |
# File 'lib/daru/dataframe.rb', line 362 def at *positions if AXES.include? positions.last axis = positions.pop return row_at(*positions) if axis == :row end original_positions = positions positions = coerce_positions(*positions, ncols) validate_positions(*positions, ncols) if positions.is_a? Integer @data[positions].dup else Daru::DataFrame.new positions.map { |pos| @data[pos].dup }, index: @index, order: @vectors.at(*original_positions), name: @name end end |
#bootstrap(n = nil) ⇒ Daru::DataFrame
Creates a DataFrame with the random data, of n size. If n not given, uses original number of rows.
890 891 892 893 894 895 896 897 898 |
# File 'lib/daru/dataframe.rb', line 890 def bootstrap(n=nil) n ||= nrows Daru::DataFrame.new({}, order: @vectors).tap do |df_boot| n.times do df_boot.add_row(row[rand(n)]) end df_boot.update end end |
#clone(*vectors_to_clone) ⇒ Object
Returns a ‘view’ of the DataFrame, i.e the object ID’s of vectors are preserved.
Arguments
vectors_to_clone
- Names of vectors to clone. Optional. Will return a view of the whole data frame otherwise.
476 477 478 479 480 481 482 |
# File 'lib/daru/dataframe.rb', line 476 def clone *vectors_to_clone vectors_to_clone.flatten! if ArrayHelper.array_of?(vectors_to_clone, Array) vectors_to_clone = @vectors.to_a if vectors_to_clone.empty? h = vectors_to_clone.map { |vec| [vec, self[vec]] }.to_h Daru::DataFrame.new(h, clone: false, order: vectors_to_clone, name: @name) end |
#clone_only_valid ⇒ Object
Returns a ‘shallow’ copy of DataFrame if missing data is not present, or a full copy of only valid data if missing data is present.
486 487 488 489 490 491 492 |
# File 'lib/daru/dataframe.rb', line 486 def clone_only_valid if include_values?(*Daru::MISSING_VALUES) reject_values(*Daru::MISSING_VALUES) else clone end end |
#clone_structure ⇒ Object
Only clone the structure of the DataFrame.
465 466 467 |
# File 'lib/daru/dataframe.rb', line 465 def clone_structure Daru::DataFrame.new([], order: @vectors.dup, index: @index.dup, name: @name) end |
#collect(axis = :vector, &block) ⇒ Object
Iterate over a row or vector and return results in a Daru::Vector. Specify axis with :vector or :row. Default to :vector.
Description
The #collect iterator works similar to #map, the only difference being that it returns a Daru::Vector comprising of the results of each block run. The resultant Vector has the same index as that of the axis over which collect has iterated. It also accepts the optional axis argument.
Arguments
-
axis
- The axis to iterate over. Can be :vector (or :column)
or :row. Default to :vector.
647 648 649 |
# File 'lib/daru/dataframe.rb', line 647 def collect axis=:vector, &block dispatch_to_axis_pl axis, :collect, &block end |
#collect_matrix ⇒ ::Matrix
Generate a matrix, based on vector names of the DataFrame.
:nocov: FIXME: Even not trying to cover this: I can’t get, how it is expected to work.… – zverok
842 843 844 845 846 847 848 849 850 851 852 853 |
# File 'lib/daru/dataframe.rb', line 842 def collect_matrix return to_enum(:collect_matrix) unless block_given? vecs = vectors.to_a rows = vecs.collect { |row| vecs.collect { |col| yield row,col } } Matrix.rows(rows) end |
#collect_row_with_index(&block) ⇒ Object
816 817 818 819 820 |
# File 'lib/daru/dataframe.rb', line 816 def collect_row_with_index &block return to_enum(:collect_row_with_index) unless block_given? Daru::Vector.new(each_row_with_index.map(&block), index: @index) end |
#collect_rows(&block) ⇒ Object
Retrieves a Daru::Vector, based on the result of calculation performed on each row.
810 811 812 813 814 |
# File 'lib/daru/dataframe.rb', line 810 def collect_rows &block return to_enum(:collect_rows) unless block_given? Daru::Vector.new(each_row.map(&block), index: @index) end |
#collect_vector_with_index(&block) ⇒ Object
830 831 832 833 834 |
# File 'lib/daru/dataframe.rb', line 830 def collect_vector_with_index &block return to_enum(:collect_vector_with_index) unless block_given? Daru::Vector.new(each_vector_with_index.map(&block), index: @vectors) end |
#collect_vectors(&block) ⇒ Object
Retrives a Daru::Vector, based on the result of calculation performed on each vector.
824 825 826 827 828 |
# File 'lib/daru/dataframe.rb', line 824 def collect_vectors &block return to_enum(:collect_vectors) unless block_given? Daru::Vector.new(each_vector.map(&block), index: @vectors) end |
#compute(text, &block) ⇒ Object
Returns a vector, based on a string with a calculation based on vector.
The calculation will be eval’ed, so you can put any variable or expression valid on ruby.
For example:
a = Daru::Vector.new [1,2]
b = Daru::Vector.new [3,4]
ds = Daru::DataFrame.new({:a => a,:b => b})
ds.compute("a+b")
=> Vector [4,6]
989 990 991 992 |
# File 'lib/daru/dataframe.rb', line 989 def compute text, &block return instance_eval(&block) if block_given? instance_eval(text) end |
#concat(other_df) ⇒ Object
Concatenate another DataFrame along corresponding columns. If columns do not exist in both dataframes, they are filled with nils
1226 1227 1228 1229 1230 1231 1232 1233 1234 |
# File 'lib/daru/dataframe.rb', line 1226 def concat other_df vectors = (@vectors.to_a + other_df.vectors.to_a).uniq data = vectors.map do |v| get_vector_anyways(v).dup.concat(other_df.get_vector_anyways(v)) end Daru::DataFrame.new(data, order: vectors) end |
#create_sql(table, charset = 'UTF8') ⇒ Object
Create a sql, basen on a given Dataset
Arguments
-
table - String specifying name of the table that will created in SQL.
-
charset - Character set. Default is “UTF8”.
1663 1664 1665 1666 1667 1668 1669 1670 1671 |
# File 'lib/daru/dataframe.rb', line 1663 def create_sql(table,charset='UTF8') sql = "CREATE TABLE #{table} (" fields = vectors.to_a.collect do |f| v = self[f] f.to_s + ' ' + v.db_type end sql + fields.join(",\n ")+") CHARACTER SET=#{charset};" end |
#delete_row(index) ⇒ Object
Delete a row
874 875 876 877 878 879 880 881 882 883 884 |
# File 'lib/daru/dataframe.rb', line 874 def delete_row index idx = named_index_for index raise IndexError, "Index #{index} does not exist." unless @index.include? idx @index = Daru::Index.new(@index.to_a - [idx]) each_vector do |vector| vector.delete_at idx end set_size end |
#delete_vector(vector) ⇒ Object
Delete a vector
857 858 859 860 861 862 863 864 |
# File 'lib/daru/dataframe.rb', line 857 def delete_vector vector raise IndexError, "Vector #{vector} does not exist." unless @vectors.include?(vector) @data.delete_at @vectors[vector] @vectors = Daru::Index.new @vectors.to_a - [vector] self end |
#delete_vectors(*vectors) ⇒ Object
Deletes a list of vectors
867 868 869 870 871 |
# File 'lib/daru/dataframe.rb', line 867 def delete_vectors *vectors Array(vectors).each { |vec| delete_vector vec } self end |
#dup(vectors_to_dup = nil) ⇒ Object
Duplicate the DataFrame entirely.
Arguments
-
vectors_to_dup
- An Array specifying the names of Vectors to
be duplicated. Will duplicate the entire DataFrame if not specified.
455 456 457 458 459 460 461 462 |
# File 'lib/daru/dataframe.rb', line 455 def dup vectors_to_dup=nil vectors_to_dup = @vectors.to_a unless vectors_to_dup src = vectors_to_dup.map { |vec| @data[@vectors[vec]].dup } new_order = Daru::Index.new(vectors_to_dup) Daru::DataFrame.new src, order: new_order, index: @index.dup, name: @name, clone: true end |
#dup_only_valid(vecs = nil) ⇒ Object
Creates a new duplicate dataframe containing only rows without a single missing value.
496 497 498 499 500 501 502 503 |
# File 'lib/daru/dataframe.rb', line 496 def dup_only_valid vecs=nil rows_with_nil = @data.map { |vec| vec.indexes(*Daru::MISSING_VALUES) } .inject(&:concat) .uniq row_indexes = @index.to_a (vecs.nil? ? self : dup(vecs)).row[*(row_indexes - rows_with_nil)] end |
#each(axis = :vector, &block) ⇒ Object
Iterate over each row or vector of the DataFrame. Specify axis by passing :vector or :row as the argument. Default to :vector.
Description
‘#each` works exactly like Array#each. The default mode for `each` is to iterate over the columns of the DataFrame. To iterate over rows you must pass the axis, i.e `:row` as an argument.
Arguments
-
axis
- The axis to iterate over. Can be :vector (or :column)
or :row. Default to :vector.
628 629 630 |
# File 'lib/daru/dataframe.rb', line 628 def each axis=:vector, &block dispatch_to_axis axis, :each, &block end |
#each_index(&block) ⇒ Object
Iterate over each index of the DataFrame.
562 563 564 565 566 567 568 |
# File 'lib/daru/dataframe.rb', line 562 def each_index &block return to_enum(:each_index) unless block_given? @index.each(&block) self end |
#each_row ⇒ Object
Iterate over each row
595 596 597 598 599 600 601 602 603 |
# File 'lib/daru/dataframe.rb', line 595 def each_row return to_enum(:each_row) unless block_given? @index.size.times do |pos| yield row_at(pos) end self end |
#each_row_with_index ⇒ Object
605 606 607 608 609 610 611 612 613 |
# File 'lib/daru/dataframe.rb', line 605 def each_row_with_index return to_enum(:each_row_with_index) unless block_given? @index.each do |index| yield access_row(index), index end self end |
#each_vector(&block) ⇒ Object Also known as: each_column
Iterate over each vector
571 572 573 574 575 576 577 |
# File 'lib/daru/dataframe.rb', line 571 def each_vector(&block) return to_enum(:each_vector) unless block_given? @data.each(&block) self end |
#each_vector_with_index ⇒ Object Also known as: each_column_with_index
Iterate over each vector alongwith the name of the vector
582 583 584 585 586 587 588 589 590 |
# File 'lib/daru/dataframe.rb', line 582 def each_vector_with_index return to_enum(:each_vector_with_index) unless block_given? @vectors.each do |vector| yield @data[@vectors[vector]], vector end self end |
#filter(axis = :vector, &block) ⇒ Object
Retain vectors or rows if the block returns a truthy value.
Description
For filtering out certain rows/vectors based on their values, use the #filter method. By default it iterates over vectors and keeps those vectors for which the block returns true. It accepts an optional axis argument which lets you specify whether you want to iterate over vectors or rows.
Arguments
-
axis
- The axis to map over. Can be :vector (or :column) or :row.
Default to :vector.
Usage
# Filter vectors
df.filter do |vector|
vector.type == :numeric and vector.median < 50
end
# Filter rows
df.filter(:row) do |row|
row[:a] + row[:d] < 100
end
736 737 738 |
# File 'lib/daru/dataframe.rb', line 736 def filter axis=:vector, &block dispatch_to_axis_pl axis, :filter, &block end |
#filter_rows ⇒ Object
Iterates over each row and retains it in a new DataFrame if the block returns true for that row.
919 920 921 922 923 924 925 |
# File 'lib/daru/dataframe.rb', line 919 def filter_rows return to_enum(:filter_rows) unless block_given? keep_rows = @index.map { |index| yield access_row(index) } where keep_rows end |
#filter_vector(vec, &block) ⇒ Object
creates a new vector with the data of a given field which the block returns true
913 914 915 |
# File 'lib/daru/dataframe.rb', line 913 def filter_vector vec, &block Daru::Vector.new each_row.select(&block).map { |row| row[vec] } end |
#filter_vectors(&block) ⇒ Object
Iterates over each vector and retains it in a new DataFrame if the block returns true for that vector.
929 930 931 932 933 |
# File 'lib/daru/dataframe.rb', line 929 def filter_vectors &block return to_enum(:filter_vectors) unless block_given? dup.tap { |df| df.keep_vector_if(&block) } end |
#get_vector_anyways(v) ⇒ Object
1220 1221 1222 |
# File 'lib/daru/dataframe.rb', line 1220 def get_vector_anyways(v) @vectors.include?(v) ? self[v].to_a : [nil] * size end |
#group_by(*vectors) ⇒ Object
Group elements by vector to perform operations on them. Returns a Daru::Core::GroupBy object.See the Daru::Core::GroupBy docs for a detailed list of possible operations.
Arguments
-
vectors - An Array contatining names of vectors to group by.
Usage
df = Daru::DataFrame.new({
a: %w{foo bar foo bar foo bar foo foo},
b: %w{one one two three two two one three},
c: [1 ,2 ,3 ,1 ,3 ,6 ,3 ,8],
d: [11 ,22 ,33 ,44 ,55 ,66 ,77 ,88]
})
df.group_by([:a,:b,:c]).groups
#=> {["bar", "one", 2]=>[1],
# ["bar", "three", 1]=>[3],
# ["bar", "two", 6]=>[5],
# ["foo", "one", 1]=>[0],
# ["foo", "one", 3]=>[6],
# ["foo", "three", 8]=>[7],
# ["foo", "two", 3]=>[2, 4]}
1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 |
# File 'lib/daru/dataframe.rb', line 1199 def group_by *vectors vectors.flatten! # FIXME: wouldn't it better to do vectors - @vectors here and # raise one error with all non-existent vector names?.. - zverok, 2016-05-18 vectors.each { |v| raise(ArgumentError, "Vector #{v} does not exist") unless has_vector?(v) } Daru::Core::GroupBy.new(self, vectors) end |
#has_missing_data? ⇒ Boolean Also known as: flawed?
1011 1012 1013 |
# File 'lib/daru/dataframe.rb', line 1011 def has_missing_data? !!@data.any? { |vec| vec.include_values?(*Daru::MISSING_VALUES) } end |
#has_vector?(vector) ⇒ Boolean
Check if a vector is present
1086 1087 1088 |
# File 'lib/daru/dataframe.rb', line 1086 def has_vector? vector @vectors.include? vector end |
#head(quantity = 10) ⇒ Object Also known as: first
The first ten elements of the DataFrame
1134 1135 1136 |
# File 'lib/daru/dataframe.rb', line 1134 def head quantity=10 row.at 0..(quantity-1) end |
#include_values?(*values) ⇒ true, false
Check if any of given values occur in the data frame
1030 1031 1032 |
# File 'lib/daru/dataframe.rb', line 1030 def include_values?(*values) @data.any? { |vec| vec.include_values?(*values) } end |
#inspect(spacing = 10, threshold = 15) ⇒ Object
Pretty print in a nice table format for the command line (irb/pry/iruby)
1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 |
# File 'lib/daru/dataframe.rb', line 1850 def inspect spacing=10, threshold=15 row_headers = index.is_a?(MultiIndex) ? index.sparse_tuples : index.to_a name_part = @name ? ": #{@name} " : '' "#<#{self.class}#{name_part}(#{nrows}x#{ncols})>\n" + Formatters::Table.format( each_row.lazy, row_headers: row_headers, headers: vectors, threshold: threshold, spacing: spacing ) end |
#interact_code(vector_names, full) ⇒ Object
1908 1909 1910 1911 1912 1913 1914 1915 1916 |
# File 'lib/daru/dataframe.rb', line 1908 def interact_code vector_names, full dfs = vector_names.zip(full).map do |vec_name, f| self[vec_name].contrast_code(full: f).each.to_a end all_vectors = recursive_product(dfs) Daru::DataFrame.new all_vectors, order: all_vectors.map(&:name) end |
#join(other_df, opts = {}) ⇒ Daru::DataFrame
Join 2 DataFrames with SQL style joins. Currently supports inner, left outer, right outer and full outer joins.
1586 1587 1588 |
# File 'lib/daru/dataframe.rb', line 1586 def join(other_df,opts={}) Daru::Core::Merge.join(self, other_df, opts) end |
#keep_row_if ⇒ Object
900 901 902 903 904 |
# File 'lib/daru/dataframe.rb', line 900 def keep_row_if @index .reject { |idx| yield access_row(idx) } .each { |idx| delete_row idx } end |
#keep_vector_if ⇒ Object
906 907 908 909 910 |
# File 'lib/daru/dataframe.rb', line 906 def keep_vector_if @vectors.each do |vector| delete_vector(vector) unless yield(@data[@vectors[vector]], vector) end end |
#map(axis = :vector, &block) ⇒ Object
Map over each vector or row of the data frame according to the argument specified. Will return an Array of the resulting elements. To map over each row/vector and get a DataFrame, see #recode.
Description
The #map iterator works like Array#map. The value returned by each run of the block is added to an Array and the Array is returned. This method also accepts an axis argument, like #each. The default is :vector.
Arguments
-
axis
- The axis to map over. Can be :vector (or :column) or :row.
Default to :vector.
667 668 669 |
# File 'lib/daru/dataframe.rb', line 667 def map axis=:vector, &block dispatch_to_axis_pl axis, :map, &block end |
#map!(axis = :vector, &block) ⇒ Object
Destructive map. Modifies the DataFrame. Each run of the block must return a Daru::Vector. You can specify the axis to map over as the argument. Default to :vector.
Arguments
-
axis
- The axis to map over. Can be :vector (or :column) or :row.
Default to :vector.
679 680 681 682 683 684 685 |
# File 'lib/daru/dataframe.rb', line 679 def map! axis=:vector, &block if axis == :vector || axis == :column map_vectors!(&block) elsif axis == :row map_rows!(&block) end end |
#map_rows(&block) ⇒ Object
Map each row
786 787 788 789 790 |
# File 'lib/daru/dataframe.rb', line 786 def map_rows &block return to_enum(:map_rows) unless block_given? each_row.map(&block) end |
#map_rows! ⇒ Object
798 799 800 801 802 803 804 805 806 |
# File 'lib/daru/dataframe.rb', line 798 def map_rows! return to_enum(:map_rows!) unless block_given? index.dup.each do |i| row[i] = should_be_vector!(yield(row[i])) end self end |
#map_rows_with_index(&block) ⇒ Object
792 793 794 795 796 |
# File 'lib/daru/dataframe.rb', line 792 def map_rows_with_index &block return to_enum(:map_rows_with_index) unless block_given? each_row_with_index.map(&block) end |
#map_vectors(&block) ⇒ Object
Map each vector and return an Array.
761 762 763 764 765 |
# File 'lib/daru/dataframe.rb', line 761 def map_vectors &block return to_enum(:map_vectors) unless block_given? @data.map(&block) end |
#map_vectors! ⇒ Object
Destructive form of #map_vectors
768 769 770 771 772 773 774 775 776 |
# File 'lib/daru/dataframe.rb', line 768 def map_vectors! return to_enum(:map_vectors!) unless block_given? vectors.dup.each do |n| self[n] = should_be_vector!(yield(self[n])) end self end |
#map_vectors_with_index(&block) ⇒ Object
Map vectors alongwith the index.
779 780 781 782 783 |
# File 'lib/daru/dataframe.rb', line 779 def map_vectors_with_index &block return to_enum(:map_vectors_with_index) unless block_given? each_vector_with_index.map(&block) end |
#merge(other_df) ⇒ Daru::DataFrame
Merge vectors from two DataFrames. In case of name collision, the vectors names are changed to x_1, x_2 .…
1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 |
# File 'lib/daru/dataframe.rb', line 1544 def merge other_df # rubocop:disable Metrics/AbcSize raise ArgumentError, "Number of rows must be equal in this: #{nrows} and other: #{other_df.nrows}" \ unless nrows == other_df.nrows new_fields = (@vectors.to_a + other_df.vectors.to_a) new_fields = ArrayHelper.recode_repeated(new_fields) DataFrame.new({}, order: new_fields).tap do |df_new| (0...nrows).each do |i| df_new.add_row row[i].to_a + other_df.row[i].to_a end df_new.update end end |
#missing_values_rows(missing_values = [nil]) ⇒ Object Also known as: vector_missing_values
Return a vector with the number of missing values in each row.
Arguments
-
missing_values
- An Array of the values that should be
treated as ‘missing’. The default missing value is nil.
1000 1001 1002 1003 1004 1005 1006 |
# File 'lib/daru/dataframe.rb', line 1000 def missing_values_rows missing_values=[nil] number_of_missing = each_row.map do |row| row.indexes(*missing_values).size end Daru::Vector.new number_of_missing, index: @index, name: "#{@name}_missing_rows" end |
#ncols ⇒ Object
The number of vectors
1081 1082 1083 |
# File 'lib/daru/dataframe.rb', line 1081 def ncols @vectors.size end |
#nest(*tree_keys, &_block) ⇒ Object
Return a nested hash using vector names as keys and an array constructed of hashes with other values. If block provided, is used to provide the values, with parameters row
of dataset, current
last hash on hierarchy and name
of the key to include
1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 |
# File 'lib/daru/dataframe.rb', line 1038 def nest *tree_keys, &_block tree_keys = tree_keys[0] if tree_keys[0].is_a? Array each_row.each_with_object({}) do |row, current| # Create tree *keys, last = tree_keys current = keys.inject(current) { |c, f| c[row[f]] ||= {} } name = row[last] if block_given? current[name] = yield(row, current, name) else current[name] ||= [] current[name].push(row.to_h.delete_if { |key,_value| tree_keys.include? key }) end end end |
#nrows ⇒ Object
The number of rows
1076 1077 1078 |
# File 'lib/daru/dataframe.rb', line 1076 def nrows @index.size end |
#numeric_vector_names ⇒ Object
1348 1349 1350 |
# File 'lib/daru/dataframe.rb', line 1348 def numeric_vector_names @vectors.select { |v| self[v].numeric? } end |
#numeric_vectors ⇒ Object
Return the indexes of all the numeric vectors. Will include vectors with nils alongwith numbers.
1341 1342 1343 1344 1345 1346 |
# File 'lib/daru/dataframe.rb', line 1341 def numeric_vectors # FIXME: Why _with_index ?.. each_vector_with_index .select { |vec, _i| vec.numeric? } .map(&:last) end |
#one_to_many(parent_fields, pattern) ⇒ Object
Creates a new dataset for one to many relations on a dataset, based on pattern of field names.
for example, you have a survey for number of children with this structure:
id, name, child_name_1, child_age_1, child_name_2, child_age_2
with
ds.one_to_many([:id], "child_%v_%n"
the field of first parameters will be copied verbatim to new dataset, and fields which responds to second pattern will be added one case for each different %n.
1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 |
# File 'lib/daru/dataframe.rb', line 1621 def one_to_many(parent_fields, pattern) vars, numbers = one_to_many_components(pattern) DataFrame.new([], order: [*parent_fields, '_col_id', *vars]).tap do |ds| each_row do |row| verbatim = parent_fields.map { |f| [f, row[f]] }.to_h numbers.each do |n| generated = one_to_many_row row, n, vars, pattern next if generated.values.all?(&:nil?) ds.add_row(verbatim.merge(generated).merge('_col_id' => n)) end end ds.update end end |
#only_numerics(opts = {}) ⇒ Object
Return a DataFrame of only the numerical Vectors. If clone: false is specified as option, only a view of the Vectors will be returned. Defaults to clone: true.
1355 1356 1357 1358 1359 1360 1361 |
# File 'lib/daru/dataframe.rb', line 1355 def only_numerics opts={} cln = opts[:clone] == false ? false : true arry = numeric_vectors.map { |v| self[v] } order = Index.new(numeric_vectors) Daru::DataFrame.new(arry, clone: cln, order: order, index: @index) end |
#pivot_table(opts = {}) ⇒ Object
Pivots a data frame on specified vectors and applies an aggregate function to quickly generate a summary.
Options
:index
- Keys to group by on the pivot table row index. Pass vector names contained in an Array.
:vectors
- Keys to group by on the pivot table column index. Pass vector names contained in an Array.
:agg
- Function to aggregate the grouped values. Default to :mean. Can use any of the statistics functions applicable on Vectors that can be found in the Daru::Statistics::Vector module.
:values
- Columns to aggregate. Will consider all numeric columns not specified in :index or :vectors. Optional.
Usage
df = Daru::DataFrame.new({
a: ['foo' , 'foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'bar'],
b: ['one' , 'one', 'one', 'two', 'two', 'one', 'one', 'two', 'two'],
c: ['small','large','large','small','small','large','small','large','small'],
d: [1,2,2,3,3,4,5,6,7],
e: [2,4,4,6,6,8,10,12,14]
})
df.pivot_table(index: [:a], vectors: [:b], agg: :sum, values: :e)
#=>
# #<Daru::DataFrame:88342020 @name = 08cdaf4e-b154-4186-9084-e76dd191b2c9 @size = 2>
# [:e, :one] [:e, :two]
# [:bar] 18 26
# [:foo] 10 12
1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 |
# File 'lib/daru/dataframe.rb', line 1523 def pivot_table opts={} raise ArgumentError, 'Specify grouping index' if opts[:index].to_a.empty? index = opts[:index] vectors = opts[:vectors] || [] aggregate_function = opts[:agg] || :mean values = prepare_pivot_values index, vectors, opts raise IndexError, 'No numeric vectors to aggregate' if values.empty? grouped = group_by(index) return grouped.send(aggregate_function) if vectors.empty? super_hash = make_pivot_hash grouped, vectors, values, aggregate_function pivot_dataframe super_hash end |
#plotting_library=(lib) ⇒ Object
264 265 266 267 268 269 270 271 272 273 274 275 |
# File 'lib/daru/dataframe.rb', line 264 def plotting_library= lib case lib when :gruff, :nyaplot @plotting_library = lib extend Module.const_get( "Daru::Plotting::DataFrame::#{lib.to_s.capitalize}Library" ) if Daru.send("has_#{lib}?".to_sym) else raise ArguementError, "Plotting library #{lib} not supported. "\ 'Supported libraries are :nyaplot and :gruff' end end |
#recast(opts = {}) ⇒ Object
1832 1833 1834 1835 1836 |
# File 'lib/daru/dataframe.rb', line 1832 def recast opts={} opts.each do |vector_name, dtype| self[vector_name].cast(dtype: dtype) end end |
#recode(axis = :vector, &block) ⇒ Object
Maps over the DataFrame and returns a DataFrame. Each run of the block must return a Daru::Vector object. You can specify the axis to map over. Default to :vector.
Description
Recode works similarly to #map, but an important difference between the two is that recode returns a modified Daru::DataFrame instead of an Array. For this reason, #recode expects that every run of the block to return a Daru::Vector.
Just like map and each, recode also accepts an optional axis argument.
Arguments
-
axis
- The axis to map over. Can be :vector (or :column) or :row.
Default to :vector.
704 705 706 |
# File 'lib/daru/dataframe.rb', line 704 def recode axis=:vector, &block dispatch_to_axis_pl axis, :recode, &block end |
#recode_rows ⇒ Object
750 751 752 753 754 755 756 757 758 |
# File 'lib/daru/dataframe.rb', line 750 def recode_rows block_given? or return to_enum(:recode_rows) dup.tap do |df| df.each_row_with_index do |r, i| df.row[i] = should_be_vector!(yield(r)) end end end |
#recode_vectors ⇒ Object
740 741 742 743 744 745 746 747 748 |
# File 'lib/daru/dataframe.rb', line 740 def recode_vectors block_given? or return to_enum(:recode_vectors) dup.tap do |df| df.each_vector_with_index do |v, i| df[*i] = should_be_vector!(yield(v)) end end end |
#reindex(new_index) ⇒ Object
Change the index of the DataFrame and preserve the labels of the previous indexing. New index can be Daru::Index or any of its subclasses.
1269 1270 1271 1272 1273 1274 1275 1276 1277 |
# File 'lib/daru/dataframe.rb', line 1269 def reindex new_index raise ArgumentError, 'Must pass the new index of type Index or its '\ "subclasses, not #{new_index.class}" unless new_index.is_a?(Daru::Index) cl = Daru::DataFrame.new({}, order: @vectors, index: new_index, name: @name) new_index.each_with_object(cl) do |idx, memo| memo.row[idx] = @index.include?(idx) ? row[idx] : [nil]*ncols end end |
#reindex_vectors(new_vectors) ⇒ Object
1210 1211 1212 1213 1214 1215 1216 1217 1218 |
# File 'lib/daru/dataframe.rb', line 1210 def reindex_vectors new_vectors raise ArgumentError, 'Must pass the new index of type Index or its '\ "subclasses, not #{new_index.class}" unless new_vectors.is_a?(Daru::Index) cl = Daru::DataFrame.new({}, order: new_vectors, index: @index, name: @name) new_vectors.each_with_object(cl) do |vec, memo| memo[vec] = @vectors.include?(vec) ? self[vec] : [nil]*nrows end end |
#reject_values(*values) ⇒ Daru::DataFrame
Returns a dataframe in which rows with any of the mentioned values
are ignored.
522 523 524 525 526 527 528 529 530 531 532 |
# File 'lib/daru/dataframe.rb', line 522 def reject_values(*values) positions = size.times.to_a - @data.flat_map { |vec| vec.positions(*values) } # Handle the case when positions size is 1 and #row_at wouldn't return a df if positions.size == 1 pos = positions.first row_at(pos..pos) else row_at(*positions) end end |
#rename(new_name) ⇒ Object Also known as: name=
Rename the DataFrame.
1757 1758 1759 1760 |
# File 'lib/daru/dataframe.rb', line 1757 def rename new_name @name = new_name self end |
#rename_vectors(name_map) ⇒ Object
Renames the vectors
Arguments
-
name_map - A hash where the keys are the exising vector names and
the values are the new names. If a vector is renamed to a vector name that is already in use, the existing one is overwritten.
Usage
df = Daru::DataFrame.new({ a: [1,2,3,4], b: [:a,:b,:c,:d], c: [11,22,33,44] })
df.rename_vectors :a => :alpha, :c => :gamma
df.vectors.to_a #=> [:alpha, :b, :gamma]
1331 1332 1333 1334 1335 1336 1337 |
# File 'lib/daru/dataframe.rb', line 1331 def rename_vectors name_map existing_targets = name_map.select { |k,v| k != v }.values & vectors.to_a delete_vectors(*existing_targets) new_names = vectors.to_a.map { |v| name_map[v] ? name_map[v] : v } self.vectors = Daru::Index.new new_names end |
#replace_values(old_values, new_value) ⇒ Daru::DataFrame
Replace specified values with given value
556 557 558 559 |
# File 'lib/daru/dataframe.rb', line 556 def replace_values old_values, new_value @data.each { |vec| vec.replace_values old_values, new_value } self end |
#report_building(b) ⇒ Object
:nodoc: #
1368 1369 1370 1371 1372 1373 1374 1375 1376 |
# File 'lib/daru/dataframe.rb', line 1368 def report_building(b) # :nodoc: # b.section(name: @name) do |g| g.text "Number of rows: #{nrows}" @vectors.each do |v| g.text "Element:[#{v}]" g.parse_element(self[v]) end end end |
#respond_to_missing?(name, include_private = false) ⇒ Boolean
1904 1905 1906 |
# File 'lib/daru/dataframe.rb', line 1904 def respond_to_missing?(name, include_private=false) name.to_s.end_with?('=') || has_vector?(name) || super end |
#row ⇒ Object
Access a row or set/create a row. Refer #[] and #[]= docs for details.
Usage
df.row[:a] # access row named ':a'
df.row[:b] = [1,2,3] # set row ':b' to [1,2,3]
445 446 447 |
# File 'lib/daru/dataframe.rb', line 445 def row Daru::Accessors::DataFrameByRow.new(self) end |
#row_at(*positions) ⇒ Daru::Vector, Daru::DataFrame
Retrive rows by positions
298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 |
# File 'lib/daru/dataframe.rb', line 298 def row_at *positions original_positions = positions positions = coerce_positions(*positions, nrows) validate_positions(*positions, nrows) if positions.is_a? Integer return Daru::Vector.new @data.map { |vec| vec.at(*positions) }, index: @vectors else new_rows = @data.map { |vec| vec.at(*original_positions) } return Daru::DataFrame.new new_rows, index: @index.at(*original_positions), order: @vectors end end |
#save(filename) ⇒ Object
Use marshalling to save dataframe to a file.
1806 1807 1808 |
# File 'lib/daru/dataframe.rb', line 1806 def save filename Daru::IO.save self, filename end |
#set_at(positions, vector) ⇒ Object
Set vectors by positions
397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 |
# File 'lib/daru/dataframe.rb', line 397 def set_at positions, vector if positions.last == :row positions.pop return set_row_at(positions, vector) end validate_positions(*positions, ncols) vector = if vector.is_a? Daru::Vector vector.reindex @index else Daru::Vector.new vector end raise SizeError, 'Vector length should match index length' if vector.size != @index.size positions.each { |pos| @data[pos] = vector } end |
#set_index(new_index, opts = {}) ⇒ Object
Set a particular column as the new DF
1237 1238 1239 1240 1241 1242 1243 1244 1245 |
# File 'lib/daru/dataframe.rb', line 1237 def set_index new_index, opts={} raise ArgumentError, 'All elements in new index must be unique.' if @size != self[new_index].uniq.size self.index = Daru::Index.new(self[new_index].to_a) delete_vector(new_index) unless opts[:keep] self end |
#set_row_at(positions, vector) ⇒ Object
Set rows by positions
329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 |
# File 'lib/daru/dataframe.rb', line 329 def set_row_at positions, vector validate_positions(*positions, nrows) vector = if vector.is_a? Daru::Vector vector.reindex @vectors else Daru::Vector.new vector end raise SizeError, 'Vector length should match row length' if vector.size != @vectors.size @data.each_with_index do |vec, pos| vec.set_at(positions, vector.at(pos)) end @index = @data[0].index set_size end |
#shape ⇒ Object
Return the number of rows and columns of the DataFrame in an Array.
1071 1072 1073 |
# File 'lib/daru/dataframe.rb', line 1071 def shape [nrows, ncols] end |
#sort(vector_order, opts = {}) ⇒ Object
Non-destructive version of #sort!
1485 1486 1487 |
# File 'lib/daru/dataframe.rb', line 1485 def sort vector_order, opts={} dup.sort! vector_order, opts end |
#sort!(vector_order, opts = {}) ⇒ Object
Sorts a dataframe (ascending/descending) in the given pripority sequence of vectors, with or without a block.
1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 |
# File 'lib/daru/dataframe.rb', line 1461 def sort! vector_order, opts={} raise ArgumentError, 'Required atleast one vector name' if vector_order.empty? # To enable sorting with categorical data, # map categories to integers preserving their order old = convert_categorical_vectors vector_order block = sort_prepare_block vector_order, opts order = @index.size.times.sort(&block) new_index = @index.reorder order # To reverse map mapping of categorical data to integers restore_categorical_vectors old @data.each do |vector| vector.reorder! order end self.index = new_index self end |
#split_by_category(cat_name) ⇒ Array
Split the dataframe into many dataframes based on category vector
1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 |
# File 'lib/daru/dataframe.rb', line 1936 def split_by_category cat_name cat_dv = self[cat_name] raise ArguementError, "#{cat_name} is not a category vector" unless cat_dv.category? cat_dv.categories.map do |cat| where(cat_dv.eq cat) .rename(cat) .delete_vector cat_name end end |
#summary(method = :to_text) ⇒ Object
Generate a summary of this DataFrame with ReportBuilder.
1364 1365 1366 |
# File 'lib/daru/dataframe.rb', line 1364 def summary(method=:to_text) ReportBuilder.new(no_title: true).add(self).send(method) end |
#tail(quantity = 10) ⇒ Object Also known as: last
The last ten elements of the DataFrame
1143 1144 1145 1146 |
# File 'lib/daru/dataframe.rb', line 1143 def tail quantity=10 start = [-quantity, -size].max row.at start..-1 end |
#to_a ⇒ Object
Converts the DataFrame into an array of hashes where key is vector name and value is the corresponding element. The 0th index of the array contains the array of hashes while the 1th index contains the indexes of each row of the dataframe. Each element in the index array corresponds to its row in the array of hashes, which has the same index.
1711 1712 1713 |
# File 'lib/daru/dataframe.rb', line 1711 def to_a [each_row.map(&:to_h), @index.to_a] end |
#to_category(*names) ⇒ Daru::DataFrame
Converts the specified non category type vectors to category type vectors
1889 1890 1891 1892 |
# File 'lib/daru/dataframe.rb', line 1889 def to_category *names names.each { |n| self[n] = self[n].to_category } self end |
#to_df ⇒ self
Returns the dataframe. This can be convenient when the user does not know whether the object is a vector or a dataframe.
1676 1677 1678 |
# File 'lib/daru/dataframe.rb', line 1676 def to_df self end |
#to_gsl ⇒ Object
Convert all numeric vectors to GSL::Matrix
1681 1682 1683 1684 1685 |
# File 'lib/daru/dataframe.rb', line 1681 def to_gsl numerics_as_arrays = numeric_vectors.map { |n| self[n].to_a } GSL::Matrix.alloc(*numerics_as_arrays.transpose) end |
#to_h ⇒ Object
Converts DataFrame to a hash (explicit) with keys as vector names and values as the corresponding vectors.
1727 1728 1729 1730 1731 |
# File 'lib/daru/dataframe.rb', line 1727 def to_h @vectors .each_with_index .map { |vec_name, idx| [vec_name, @data[idx]] }.to_h end |
#to_hash ⇒ Object
NOTE: This alias will soon be removed. Use to_h in all future work.
69 |
# File 'lib/daru/monkeys.rb', line 69 alias :to_hash :to_h |
#to_html(threshold = 30) ⇒ Object
Convert to html for IRuby.
1734 1735 1736 1737 1738 1739 1740 1741 |
# File 'lib/daru/dataframe.rb', line 1734 def to_html threshold=30 path = if index.is_a?(MultiIndex) File.('../iruby/templates/dataframe_mi.html.erb', __FILE__) else File.('../iruby/templates/dataframe.html.erb', __FILE__) end ERB.new(File.read(path).strip).result(binding) end |
#to_json(no_index = true) ⇒ Object
Convert to json. If no_index is false then the index will NOT be included in the JSON thus created.
1717 1718 1719 1720 1721 1722 1723 |
# File 'lib/daru/dataframe.rb', line 1717 def to_json no_index=true if no_index to_a[0].to_json else to_a.to_json end end |
#to_matrix ⇒ Object
Convert all vectors of type :numeric into a Matrix.
1688 1689 1690 |
# File 'lib/daru/dataframe.rb', line 1688 def to_matrix Matrix.columns each_vector.select(&:numeric?).map(&:to_a) end |
#to_nmatrix ⇒ Object
Convert all vectors of type :numeric and not containing nils into an NMatrix.
1700 1701 1702 1703 1704 |
# File 'lib/daru/dataframe.rb', line 1700 def to_nmatrix each_vector.select do |vector| vector.numeric? && !vector.include_values?(*Daru::MISSING_VALUES) end.map(&:to_a).transpose.to_nm end |
#to_nyaplotdf ⇒ Object
Return a Nyaplot::DataFrame from the data of this DataFrame. :nocov:
1694 1695 1696 |
# File 'lib/daru/dataframe.rb', line 1694 def to_nyaplotdf Nyaplot::DataFrame.new(to_a[0]) end |
#to_REXP ⇒ Object
rubocop:disable Style/MethodName
5 6 7 8 9 10 11 12 13 |
# File 'lib/daru/extensions/rserve.rb', line 5 def to_REXP # rubocop:disable Style/MethodName names = @vectors.to_a data = names.map do |f| Rserve::REXP::Wrapper.wrap(self[f].to_a) end l = Rserve::Rlist.new(data, names.map(&:to_s)) Rserve::REXP.create_data_frame(l) end |
#to_s ⇒ Object
1743 1744 1745 |
# File 'lib/daru/dataframe.rb', line 1743 def to_s to_html end |
#transpose ⇒ Object
Transpose a DataFrame, tranposing elements and row, column indexing.
1839 1840 1841 1842 1843 1844 1845 1846 1847 |
# File 'lib/daru/dataframe.rb', line 1839 def transpose Daru::DataFrame.new( each_vector.map(&:to_a).transpose, index: @vectors, order: @index, dtype: @dtype, name: @name ) end |
#update ⇒ Object
Method for updating the metadata (i.e. missing value positions) of the after assingment/deletion etc. are complete. This is provided so that time is not wasted in creating the metadata for the vector each time assignment/deletion of elements is done. Updating data this way is called lazy loading. To set or unset lazy loading, see the .lazy_update= method.
1752 1753 1754 |
# File 'lib/daru/dataframe.rb', line 1752 def update @data.each(&:update) if Daru.lazy_update end |
#vector_by_calculation(&block) ⇒ Object
DSL for yielding each row and returning a Daru::Vector based on the value each run of the block returns.
Usage
a1 = Daru::Vector.new([1, 2, 3, 4, 5, 6, 7])
a2 = Daru::Vector.new([10, 20, 30, 40, 50, 60, 70])
a3 = Daru::Vector.new([100, 200, 300, 400, 500, 600, 700])
ds = Daru::DataFrame.new({ :a => a1, :b => a2, :c => a3 })
total = ds.vector_by_calculation { a + b + c }
# <Daru::Vector:82314050 @name = nil @size = 7 >
# nil
# 0 111
# 1 222
# 2 333
# 3 444
# 4 555
# 5 666
# 6 777
971 972 973 974 975 |
# File 'lib/daru/dataframe.rb', line 971 def vector_by_calculation &block a = each_row.map { |r| r.instance_eval(&block) } Daru::Vector.new a, index: @index end |
#vector_count_characters(vecs = nil) ⇒ Object
1056 1057 1058 1059 1060 1061 1062 |
# File 'lib/daru/dataframe.rb', line 1056 def vector_count_characters vecs=nil vecs ||= @vectors.to_a collect_rows do |row| vecs.map { |v| row[v].to_s.size }.inject(:+) end end |
#vector_mean(max_missing = 0) ⇒ Object
Calculate mean of the rows of the dataframe.
Arguments
-
max_missing
- The maximum number of elements in the row that can be
zero for the mean calculation to happen. Default to 0.
1165 1166 1167 1168 1169 1170 1171 1172 1173 |
# File 'lib/daru/dataframe.rb', line 1165 def vector_mean max_missing=0 # FIXME: in vector_sum we preserve created vector dtype, but # here we are not. Is this by design or ...? - zverok, 2016-05-18 mean_vec = Daru::Vector.new [0]*@size, index: @index, name: "mean_#{@name}" each_row_with_index.each_with_object(mean_vec) do |(row, i), memo| memo[i] = row.indexes(*Daru::MISSING_VALUES).size > max_missing ? nil : row.mean end end |
#vector_sum(vecs = nil) ⇒ Object
Returns a vector with sum of all vectors specified in the argument. If vecs parameter is empty, sum all numeric vector.
1152 1153 1154 1155 1156 1157 |
# File 'lib/daru/dataframe.rb', line 1152 def vector_sum vecs=nil vecs ||= numeric_vectors sum = Daru::Vector.new [0]*@size, index: @index, name: @name, dtype: @dtype vecs.inject(sum) { |memo, n| memo + self[n] } end |
#verify(*tests) ⇒ Object
Test each row with one or more tests. Each test is a Proc with the form *Proc.new {|row| row > 0}*
The function returns an array with all errors.
FIXME: description here is too sparse. As far as I can get, it should tell something about that each test is [descr, fields, block], and that first value may be column name to output. - zverok, 2016-05-18
943 944 945 946 947 948 949 950 |
# File 'lib/daru/dataframe.rb', line 943 def verify(*tests) id = tests.first.is_a?(Symbol) ? tests.shift : @vectors.first each_row_with_index.map do |row, i| tests.reject { |*_, block| block.call(row) } .map { |test| row, test, id, i } end.flatten end |
#where(bool_array) ⇒ Object
Query a DataFrame by passing a Daru::Core::Query::BoolArray object.
1865 1866 1867 |
# File 'lib/daru/dataframe.rb', line 1865 def where bool_array Daru::Core::Query.df_where self, bool_array end |
#write_csv(filename, opts = {}) ⇒ Object
Write this DataFrame to a CSV file.
Arguements
-
filename - Path of CSV file where the DataFrame is to be saved.
Options
-
convert_comma - If set to true, will convert any commas in any
of the data to full stops (‘.’). All the options accepted by CSV.read() can also be passed into this function.
1776 1777 1778 |
# File 'lib/daru/dataframe.rb', line 1776 def write_csv filename, opts={} Daru::IO.dataframe_write_csv self, filename, opts end |
#write_excel(filename, opts = {}) ⇒ Object
Write this dataframe to an Excel Spreadsheet
Arguments
-
filename - The path of the file where the DataFrame should be written.
1785 1786 1787 |
# File 'lib/daru/dataframe.rb', line 1785 def write_excel filename, opts={} Daru::IO.dataframe_write_excel self, filename, opts end |
#write_sql(dbh, table) ⇒ Object
Insert each case of the Dataset on the selected table
Arguments
-
dbh - DBI database connection object.
-
query - Query string.
Usage
ds = Daru::DataFrame.new({:id=>Daru::Vector.new([1,2,3]), :name=>Daru::Vector.new(["a","b","c"])})
dbh = DBI.connect("DBI:Mysql:database:localhost", "user", "password")
ds.write_sql(dbh,"test")
1801 1802 1803 |
# File 'lib/daru/dataframe.rb', line 1801 def write_sql dbh, table Daru::IO.dataframe_write_sql self, dbh, table end |