Class: Daru::DataFrame
- Extended by:
- Gem::Deprecate
- Defined in:
- lib/daru/dataframe.rb,
lib/daru/extensions/rserve.rb,
lib/daru/extensions/which_dsl.rb
Overview
rubocop:disable Metrics/ClassLength
Defined Under Namespace
Modules: SetMultiIndexStrategy, SetSingleIndexStrategy
Instance Attribute Summary collapse
-
#data ⇒ Object
readonly
TOREMOVE.
-
#index ⇒ Object
The index of the rows of the DataFrame.
-
#name ⇒ Object
readonly
The name of the DataFrame.
-
#size ⇒ Object
readonly
The number of rows present in the DataFrame.
-
#vectors ⇒ Object
The vectors (columns) index of the DataFrame.
Class Method Summary collapse
- ._load(data) ⇒ Object
-
.crosstab_by_assignation(rows, columns, values) ⇒ Object
Generates a new dataset, using three vectors - Rows - Columns - Values.
-
.from_activerecord(relation, *fields) ⇒ Object
Read a dataframe from AR::Relation.
-
.from_csv(path, opts = {}, &block) ⇒ Object
Load data from a CSV file.
-
.from_excel(path, opts = {}, &block) ⇒ Object
Read data from an Excel file into a DataFrame.
-
.from_html(path, fields = {}) ⇒ Object
Read the table data from a remote html file.
-
.from_plaintext(path, fields) ⇒ Object
Read the database from a plaintext file.
-
.from_sql(dbh, query) ⇒ Object
Read a database query and returns a Dataset.
-
.rows(source, opts = {}) ⇒ Object
Create DataFrame by specifying rows as an Array of Arrays or Array of Daru::Vector objects.
Instance Method Summary collapse
- #==(other) ⇒ Object
-
#[](*names) ⇒ Object
Access row or vector.
-
#[]=(*args) ⇒ Object
Insert a new row/vector of the specified name or modify a previous row.
- #_dump(_depth) ⇒ Object
-
#access_row_tuples_by_indexs(*indexes) ⇒ Array
Returns array of row tuples at given index(s).
- #add_row(row, index = nil) ⇒ Object
- #add_vector(n, vector) ⇒ Object
- #add_vectors_by_split(name, join = '-', sep = Daru::SPLIT_TOKEN) ⇒ Object
- #add_vectors_by_split_recode(nm, join = '-', sep = Daru::SPLIT_TOKEN) ⇒ Object
-
#aggregate(options = {}, multi_index_level = -1)) ⇒ Daru::DataFrame
Function to use for aggregating the data.
-
#all?(axis = :vector, &block) ⇒ Boolean
Works like Array#all?.
-
#any?(axis = :vector, &block) ⇒ Boolean
Works like Array#any?.
- #apply_method(method, keys: nil, by_position: true) ⇒ Object (also: #apply_method_on_sub_df)
-
#at(*positions) ⇒ Daru::Vector, Daru::DataFrame
Retrive vectors by positions.
-
#bootstrap(n = nil) ⇒ Daru::DataFrame
Creates a DataFrame with the random data, of n size.
-
#clone(*vectors_to_clone) ⇒ Object
Returns a ‘view’ of the DataFrame, i.e the object ID’s of vectors are preserved.
-
#clone_only_valid ⇒ Object
Returns a ‘shallow’ copy of DataFrame if missing data is not present, or a full copy of only valid data if missing data is present.
-
#clone_structure ⇒ Object
Only clone the structure of the DataFrame.
-
#collect(axis = :vector, &block) ⇒ Object
Iterate over a row or vector and return results in a Daru::Vector.
-
#collect_matrix ⇒ ::Matrix
Generate a matrix, based on vector names of the DataFrame.
- #collect_row_with_index(&block) ⇒ Object
-
#collect_rows(&block) ⇒ Object
Retrieves a Daru::Vector, based on the result of calculation performed on each row.
- #collect_vector_with_index(&block) ⇒ Object
-
#collect_vectors(&block) ⇒ Object
Retrives a Daru::Vector, based on the result of calculation performed on each vector.
-
#compute(text, &block) ⇒ Object
Returns a vector, based on a string with a calculation based on vector.
-
#concat(other_df) ⇒ Object
Concatenate another DataFrame along corresponding columns.
-
#create_sql(table, charset = 'UTF8') ⇒ Object
Create a sql, basen on a given Dataset.
-
#delete_row(index) ⇒ Object
Delete a row.
-
#delete_vector(vector) ⇒ Object
Delete a vector.
-
#delete_vectors(*vectors) ⇒ Object
Deletes a list of vectors.
-
#dup(vectors_to_dup = nil) ⇒ Object
Duplicate the DataFrame entirely.
-
#dup_only_valid(vecs = nil) ⇒ Object
Creates a new duplicate dataframe containing only rows without a single missing value.
-
#each(axis = :vector, &block) ⇒ Object
Iterate over each row or vector of the DataFrame.
-
#each_index(&block) ⇒ Object
Iterate over each index of the DataFrame.
-
#each_row ⇒ Object
Iterate over each row.
- #each_row_with_index ⇒ Object
-
#each_vector(&block) ⇒ Object
(also: #each_column)
Iterate over each vector.
-
#each_vector_with_index ⇒ Object
(also: #each_column_with_index)
Iterate over each vector alongwith the name of the vector.
-
#filter(axis = :vector, &block) ⇒ Object
Retain vectors or rows if the block returns a truthy value.
-
#filter_rows ⇒ Object
Iterates over each row and retains it in a new DataFrame if the block returns true for that row.
-
#filter_vector(vec, &block) ⇒ Object
creates a new vector with the data of a given field which the block returns true.
-
#filter_vectors(&block) ⇒ Object
Iterates over each vector and retains it in a new DataFrame if the block returns true for that vector.
-
#get_sub_dataframe(keys, by_position: true) ⇒ Daru::Dataframe
Extract a dataframe given row indexes or positions.
- #get_vector_anyways(v) ⇒ Object
-
#group_by(*vectors) ⇒ Object
Group elements by vector to perform operations on them.
- #group_by_and_aggregate(*group_by_keys, **aggregation_map) ⇒ Object
- #has_missing_data? ⇒ Boolean (also: #flawed?)
-
#has_vector?(vector) ⇒ Boolean
Check if a vector is present.
-
#head(quantity = 10) ⇒ Object
(also: #first)
The first ten elements of the DataFrame.
-
#include_values?(*values) ⇒ true, false
Check if any of given values occur in the data frame.
-
#initialize(source = {}, opts = {}) ⇒ DataFrame
constructor
DataFrame basically consists of an Array of Vector objects.
-
#inspect(spacing = 10, threshold = 15) ⇒ Object
Pretty print in a nice table format for the command line (irb/pry/iruby).
- #interact_code(vector_names, full) ⇒ Object
-
#join(other_df, opts = {}) ⇒ Daru::DataFrame
Join 2 DataFrames with SQL style joins.
- #keep_row_if ⇒ Object
- #keep_vector_if ⇒ Object
-
#map(axis = :vector, &block) ⇒ Object
Map over each vector or row of the data frame according to the argument specified.
-
#map!(axis = :vector, &block) ⇒ Object
Destructive map.
-
#map_rows(&block) ⇒ Object
Map each row.
- #map_rows! ⇒ Object
- #map_rows_with_index(&block) ⇒ Object
-
#map_vectors(&block) ⇒ Object
Map each vector and return an Array.
-
#map_vectors! ⇒ Object
Destructive form of #map_vectors.
-
#map_vectors_with_index(&block) ⇒ Object
Map vectors alongwith the index.
-
#merge(other_df) ⇒ Daru::DataFrame
Merge vectors from two DataFrames.
- #method_missing(name, *args, &block) ⇒ Object
-
#missing_values_rows(missing_values = [nil]) ⇒ Object
(also: #vector_missing_values)
Return a vector with the number of missing values in each row.
-
#ncols ⇒ Object
The number of vectors.
-
#nest(*tree_keys, &_block) ⇒ Object
Return a nested hash using vector names as keys and an array constructed of hashes with other values.
-
#nrows ⇒ Object
The number of rows.
- #numeric_vector_names ⇒ Object
-
#numeric_vectors ⇒ Object
Return the indexes of all the numeric vectors.
-
#one_to_many(parent_fields, pattern) ⇒ Object
Creates a new dataset for one to many relations on a dataset, based on pattern of field names.
-
#only_numerics(opts = {}) ⇒ Object
Return a DataFrame of only the numerical Vectors.
-
#order=(order_array) ⇒ Object
Reorder the vectors in a dataframe.
-
#pivot_table(opts = {}) ⇒ Object
Pivots a data frame on specified vectors and applies an aggregate function to quickly generate a summary.
-
#plot(*args, **options, &b) ⇒ Object
this method is overwritten: see Daru::DataFrame#plotting_library=.
- #plotting_library=(lib) ⇒ Object
-
#recast(opts = {}) ⇒ Object
Change dtypes of vectors by supplying a hash of :vector_name => :new_dtype.
-
#recode(axis = :vector, &block) ⇒ Object
Maps over the DataFrame and returns a DataFrame.
- #recode_rows ⇒ Object
- #recode_vectors ⇒ Object
-
#reindex(new_index) ⇒ Object
Change the index of the DataFrame and preserve the labels of the previous indexing.
- #reindex_vectors(new_vectors) ⇒ Object
-
#reject_values(*values) ⇒ Daru::DataFrame
Returns a dataframe in which rows with any of the mentioned values are ignored.
-
#rename(new_name) ⇒ Object
(also: #name=)
Rename the DataFrame.
-
#rename_vectors(name_map) ⇒ Object
Renames the vectors.
-
#replace_values(old_values, new_value) ⇒ Daru::DataFrame
Replace specified values with given value.
- #reset_index ⇒ Object
- #respond_to_missing?(name, include_private = false) ⇒ Boolean
- #rolling_fillna(direction = :forward) ⇒ Object
-
#rolling_fillna!(direction = :forward) ⇒ Object
Rolling fillna replace all Float::NAN and NIL values with the preceeding or following value.
-
#row ⇒ Object
Access a row or set/create a row.
-
#row_at(*positions) ⇒ Daru::Vector, Daru::DataFrame
Retrive rows by positions.
-
#save(filename) ⇒ Object
Use marshalling to save dataframe to a file.
-
#set_at(positions, vector) ⇒ Object
Set vectors by positions.
-
#set_index(new_index_col, opts = {}) ⇒ Object
Set a particular column as the new DF.
-
#set_row_at(positions, vector) ⇒ Object
Set rows by positions.
-
#shape ⇒ Object
Return the number of rows and columns of the DataFrame in an Array.
-
#sort(vector_order, opts = {}) ⇒ Object
Non-destructive version of #sort!.
-
#sort!(vector_order, opts = {}) ⇒ Object
Sorts a dataframe (ascending/descending) in the given pripority sequence of vectors, with or without a block.
-
#split_by_category(cat_name) ⇒ Array
Split the dataframe into many dataframes based on category vector.
-
#summary ⇒ String
Generate a summary of this DataFrame based on individual vectors in the DataFrame.
-
#tail(quantity = 10) ⇒ Object
(also: #last)
The last ten elements of the DataFrame.
-
#to_a ⇒ Object
Converts the DataFrame into an array of hashes where key is vector name and value is the corresponding element.
-
#to_category(*names) ⇒ Daru::DataFrame
Converts the specified non category type vectors to category type vectors.
-
#to_df ⇒ self
Returns the dataframe.
-
#to_gsl ⇒ Object
Convert all numeric vectors to GSL::Matrix.
-
#to_h ⇒ Object
Converts DataFrame to a hash (explicit) with keys as vector names and values as the corresponding vectors.
-
#to_html(threshold = 30) ⇒ Object
Convert to html for IRuby.
- #to_html_tbody(threshold = 30) ⇒ Object
- #to_html_thead ⇒ Object
-
#to_json(no_index = true) ⇒ Object
Convert to json.
-
#to_matrix ⇒ Object
Convert all vectors of type :numeric into a Matrix.
-
#to_nmatrix ⇒ Object
Convert all vectors of type :numeric and not containing nils into an NMatrix.
-
#to_nyaplotdf ⇒ Object
Return a Nyaplot::DataFrame from the data of this DataFrame.
-
#to_REXP ⇒ Object
rubocop:disable Style/MethodName.
- #to_s ⇒ Object
-
#transpose ⇒ Object
Transpose a DataFrame, tranposing elements and row, column indexing.
-
#union(other_df) ⇒ Object
Concatenates another DataFrame as #concat.
-
#uniq(*vtrs) ⇒ Object
Return unique rows by vector specified or all vectors.
-
#update ⇒ Object
Method for updating the metadata (i.e. missing value positions) of the after assingment/deletion etc.
-
#vector_by_calculation(&block) ⇒ Object
DSL for yielding each row and returning a Daru::Vector based on the value each run of the block returns.
- #vector_count_characters(vecs = nil) ⇒ Object
-
#vector_mean(max_missing = 0) ⇒ Object
Calculate mean of the rows of the dataframe.
-
#vector_sum(*args) ⇒ Object
Sum all numeric/specified vectors in the DataFrame.
-
#verify(*tests) ⇒ Object
Test each row with one or more tests.
-
#where(bool_array) ⇒ Object
Query a DataFrame by passing a Daru::Core::Query::BoolArray object.
- #which(&block) ⇒ Object
-
#write_csv(filename, opts = {}) ⇒ Object
Write this DataFrame to a CSV file.
-
#write_excel(filename, opts = {}) ⇒ Object
Write this dataframe to an Excel Spreadsheet.
-
#write_sql(dbh, table) ⇒ Object
Insert each case of the Dataset on the selected table.
Methods included from Maths::Statistics::DataFrame
#acf, #correlation, #count, #covariance, #cumsum, #describe, #ema, #max, #mean, #median, #min, #mode, #percent_change, #product, #range, #rolling_count, #rolling_max, #rolling_mean, #rolling_median, #rolling_min, #rolling_std, #rolling_variance, #standardize, #std, #sum, #variance_sample
Methods included from Maths::Arithmetic::DataFrame
#%, #*, #**, #+, #-, #/, #exp, #round, #sqrt
Constructor Details
#initialize(source = {}, opts = {}) ⇒ DataFrame
DataFrame basically consists of an Array of Vector objects. These objects are indexed by row and column by vectors and index Index objects.
Arguments
-
source - Source from the DataFrame is to be initialized. Can be a Hash
of names and vectors (array or Daru::Vector), an array of arrays or array of Daru::Vectors.
Options
:order
- An Array/Daru::Index/Daru::MultiIndex containing the order in which Vectors should appear in the DataFrame.
:index
- An Array/Daru::Index/Daru::MultiIndex containing the order in which rows of the DataFrame will be named.
:name
- A name for the DataFrame.
:clone
- Specify as true or false. When set to false, and Vector objects are passed for the source, the Vector objects will not duplicated when creating the DataFrame. Will have no effect if Array is passed in the source, or if the passed Daru::Vectors have different indexes. Default to true.
Usage
df = Daru::DataFrame.new
# =>
# <Daru::DataFrame(0x0)>
# Creates an empty DataFrame with no rows or columns.
df = Daru::DataFrame.new({}, order: [:a, :b])
#<Daru::DataFrame(0x2)>
a b
# Creates a DataFrame with no rows and columns :a and :b
df = Daru::DataFrame.new({a: [1,2,3,4], b: [6,7,8,9]}, order: [:b, :a],
index: [:a, :b, :c, :d], name: :spider_man)
# =>
# <Daru::DataFrame:80766980 @name = spider_man @size = 4>
# b a
# a 6 1
# b 7 2
# c 8 3
# d 9 4
df = Daru::DataFrame.new([[1,2,3,4],[6,7,8,9]], name: :bat_man)
# =>
# #<Daru::DataFrame: bat_man (4x2)>
# 0 1
# 0 1 6
# 1 2 7
# 2 3 8
# 3 4 9
# Dataframe having Index name
df = Daru::DataFrame.new({a: [1,2,3,4], b: [6,7,8,9]}, order: [:b, :a],
index: Daru::Index.new([:a, :b, :c, :d], name: 'idx_name'),
name: :spider_man)
# =>
# <Daru::DataFrame:80766980 @name = spider_man @size = 4>
# idx_name b a
# a 6 1
# b 7 2
# c 8 3
# d 9 4
idx = Daru::Index.new [100, 99, 101, 1, 2], name: "s1"
=> #<Daru::Index(5): s1 {100, 99, 101, 1, 2}>
df = Daru::DataFrame.new({b: [11,12,13,14,15], a: [1,2,3,4,5],
c: [11,22,33,44,55]},
order: [:a, :b, :c],
index: idx)
# =>
#<Daru::DataFrame(5x3)>
# s1 a b c
# 100 1 11 11
# 99 2 12 22
# 101 3 13 33
# 1 4 14 44
# 2 5 15 55
344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 |
# File 'lib/daru/dataframe.rb', line 344 def initialize source={}, opts={} # rubocop:disable Metrics/MethodLength vectors, index = opts[:order], opts[:index] # FIXME: just keyword arges after Ruby 2.1 @data = [] @name = opts[:name] case source when [], {} create_empty_vectors(vectors, index) when Array initialize_from_array source, vectors, index, opts when Hash initialize_from_hash source, vectors, index, opts when ->(s) { s.empty? } # TODO: likely want to remove this case create_empty_vectors(vectors, index) end set_size validate update end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(name, *args, &block) ⇒ Object
2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 |
# File 'lib/daru/dataframe.rb', line 2275 def method_missing(name, *args, &block) case when name =~ /(.+)\=/ name = name[/(.+)\=/].delete('=') name = name.to_sym unless has_vector?(name) insert_or_modify_vector [name], args[0] when has_vector?(name) self[name] when has_vector?(name.to_s) self[name.to_s] else super end end |
Instance Attribute Details
#data ⇒ Object (readonly)
TOREMOVE
244 245 246 |
# File 'lib/daru/dataframe.rb', line 244 def data @data end |
#index ⇒ Object
The index of the rows of the DataFrame
247 248 249 |
# File 'lib/daru/dataframe.rb', line 247 def index @index end |
#name ⇒ Object (readonly)
The name of the DataFrame
250 251 252 |
# File 'lib/daru/dataframe.rb', line 250 def name @name end |
#size ⇒ Object (readonly)
The number of rows present in the DataFrame
253 254 255 |
# File 'lib/daru/dataframe.rb', line 253 def size @size end |
#vectors ⇒ Object
The vectors (columns) index of the DataFrame
242 243 244 |
# File 'lib/daru/dataframe.rb', line 242 def vectors @vectors end |
Class Method Details
._load(data) ⇒ Object
2201 2202 2203 2204 2205 2206 2207 |
# File 'lib/daru/dataframe.rb', line 2201 def self._load data h = Marshal.load data Daru::DataFrame.new(h[:data], index: h[:index], order: h[:order], name: h[:name]) end |
.crosstab_by_assignation(rows, columns, values) ⇒ Object
Generates a new dataset, using three vectors
-
Rows
-
Columns
-
Values
For example, you have these values
x y v
a a 0
a b 1
b a 1
b b 0
You obtain
id a b
a 0 1
b 1 0
Useful to process outputs from databases
200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
# File 'lib/daru/dataframe.rb', line 200 def crosstab_by_assignation rows, columns, values raise 'Three vectors should be equal size' if rows.size != columns.size || rows.size!=values.size data = Hash.new { |h, col| h[col] = rows.factors.map { |r| [r, nil] }.to_h } columns.zip(rows, values).each { |c, r, v| data[c][r] = v } # FIXME: in fact, WITHOUT this line you'll obtain more "right" # data: with vectors having "rows" as an index... data = data.map { |c, r| [c, r.values] }.to_h data[:_id] = rows.factors DataFrame.new(data) end |
.from_activerecord(relation, *fields) ⇒ Object
101 102 103 |
# File 'lib/daru/dataframe.rb', line 101 def from_activerecord relation, *fields Daru::IO.from_activerecord relation, *fields end |
.from_csv(path, opts = {}, &block) ⇒ Object
Load data from a CSV file. Specify an optional block to grab the CSV object and pre-condition it (for example use the ‘convert` or `header_convert` methods).
Arguments
-
path - Local path / Remote URL of the file to load specified as a String.
Options
Accepts the same options as the Daru::DataFrame constructor and CSV.open() and uses those to eventually construct the resulting DataFrame.
Verbose Description
You can specify all the options to the ‘.from_csv` function that you do to the Ruby `CSV.read()` function, since this is what is used internally.
For example, if the columns in your CSV file are separated by something other that commas, you can use the ‘:col_sep` option. If you want to convert numeric values to numbers and not keep them as strings, you can use the `:converters` option and set it to `:numeric`.
The ‘.from_csv` function uses the following defaults for reading CSV files (that are passed into the `CSV.read()` function):
{
:col_sep => ',',
:converters => :numeric
}
48 49 50 |
# File 'lib/daru/dataframe.rb', line 48 def from_csv path, opts={}, &block Daru::IO.from_csv path, opts, &block end |
.from_excel(path, opts = {}, &block) ⇒ Object
Read data from an Excel file into a DataFrame.
Arguments
-
path - Path of the file to be read.
Options
*:worksheet_id - ID of the worksheet that is to be read.
61 62 63 |
# File 'lib/daru/dataframe.rb', line 61 def from_excel path, opts={}, &block Daru::IO.from_excel path, opts, &block end |
.from_html(path, fields = {}) ⇒ Object
Read the table data from a remote html file. Please note that this module works only for static table elements on a HTML page, and won’t work in cases where the data is being loaded into the HTML table by Javascript.
By default - all <th> tag elements in the first proper row are considered as the order, and all the <th> tag elements in the first column are considered as the index.
Arguments
-
path [String] - URL of the target HTML file.
-
fields [Hash] -
:match
- A String to match and choose a particular table(s) from multiple tables of a HTML page.:order
- An Array which would act as the user-defined order, to override the parsed Daru::DataFrame.:index
- An Array which would act as the user-defined index, to override the parsed Daru::DataFrame.:name
- A String that manually assigns a name to the scraped Daru::DataFrame, for user’s preference.
Returns
An Array of Daru::DataFrames, with each dataframe corresponding to a HTML table on that webpage.
Usage
dfs = Daru::DataFrame.from_html("http://www.moneycontrol.com/", match: "Sun Pharma")
dfs.count
# => 4
dfs.first
#
# => <Daru::DataFrame(5x4)>
# Company Price Change Value (Rs
# 0 Sun Pharma 502.60 -65.05 2,117.87
# 1 Reliance 1356.90 19.60 745.10
# 2 Tech Mahin 379.45 -49.70 650.22
# 3 ITC 315.85 6.75 621.12
# 4 HDFC 1598.85 50.95 553.91
160 161 162 |
# File 'lib/daru/dataframe.rb', line 160 def from_html path, fields={} Daru::IO.from_html path, fields end |
.from_plaintext(path, fields) ⇒ Object
Read the database from a plaintext file. For this method to work, the data should be present in a plain text file in columns. See spec/fixtures/bank2.dat for an example.
Arguments
-
path - Path of the file to be read.
-
fields - Vector names of the resulting database.
Usage
df = Daru::DataFrame.from_plaintext 'spec/fixtures/bank2.dat', [:v1,:v2,:v3,:v4,:v5,:v6]
117 118 119 |
# File 'lib/daru/dataframe.rb', line 117 def from_plaintext path, fields Daru::IO.from_plaintext path, fields end |
.from_sql(dbh, query) ⇒ Object
81 82 83 |
# File 'lib/daru/dataframe.rb', line 81 def from_sql dbh, query Daru::IO.from_sql dbh, query end |
.rows(source, opts = {}) ⇒ Object
Create DataFrame by specifying rows as an Array of Arrays or Array of Daru::Vector objects.
166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
# File 'lib/daru/dataframe.rb', line 166 def rows source, opts={} raise SizeError, 'All vectors must have same length' \ unless source.all? { |v| v.size == source.first.size } opts[:order] ||= guess_order(source) if ArrayHelper.array_of?(source, Array) || source.empty? DataFrame.new(source.transpose, opts) elsif ArrayHelper.array_of?(source, Vector) from_vector_rows(source, opts) else raise ArgumentError, "Can't create DataFrame from #{source}" end end |
Instance Method Details
#==(other) ⇒ Object
2250 2251 2252 2253 2254 2255 2256 |
# File 'lib/daru/dataframe.rb', line 2250 def == other self.class == other.class && @size == other.size && @index == other.index && @vectors == other.vectors && @vectors.to_a.all? { |v| self[v] == other[v] } end |
#[](*names) ⇒ Object
Access row or vector. Specify name of row/vector followed by axis(:row, :vector). Defaults to :vector. Use of this method is not recommended for accessing rows. Use df.row for accessing row with index ‘:a’.
390 391 392 393 |
# File 'lib/daru/dataframe.rb', line 390 def [](*names) axis = extract_axis(names, :vector) dispatch_to_axis axis, :access, *names end |
#[]=(*args) ⇒ Object
Insert a new row/vector of the specified name or modify a previous row. Instead of using this method directly, use df.row = [1,2,3] to set/create a row ‘:a’ to [1,2,3], or df.vector = [1,2,3] for vectors.
In case a Daru::Vector is specified after the equality the sign, the indexes of the vector will be matched against the row/vector indexes of the DataFrame before an insertion is performed. Unmatched indexes will be set to nil.
532 533 534 535 536 537 538 |
# File 'lib/daru/dataframe.rb', line 532 def []=(*args) vector = args.pop axis = extract_axis(args) names = args dispatch_to_axis axis, :insert_or_modify, names, vector end |
#_dump(_depth) ⇒ Object
2192 2193 2194 2195 2196 2197 2198 2199 |
# File 'lib/daru/dataframe.rb', line 2192 def _dump(_depth) Marshal.dump( data: @data, index: @index.to_a, order: @vectors.to_a, name: @name ) end |
#access_row_tuples_by_indexs(*indexes) ⇒ Array
Returns array of row tuples at given index(s)
2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 |
# File 'lib/daru/dataframe.rb', line 2365 def access_row_tuples_by_indexs *indexes return get_sub_dataframe(indexes, by_position: false).map_rows(&:to_a) if @index.is_a?(Daru::MultiIndex) positions = @index.pos(*indexes) if positions.is_a? Numeric row = get_rows_for([positions]) row.first.is_a?(Array) ? row : [row] else new_rows = get_rows_for(indexes, by_position: false) indexes.map { |index| new_rows.map { |r| r[index] } } end end |
#add_row(row, index = nil) ⇒ Object
540 541 542 |
# File 'lib/daru/dataframe.rb', line 540 def add_row row, index=nil self.row[*(index || @size)] = row end |
#add_vector(n, vector) ⇒ Object
544 545 546 |
# File 'lib/daru/dataframe.rb', line 544 def add_vector n, vector self[n] = vector end |
#add_vectors_by_split(name, join = '-', sep = Daru::SPLIT_TOKEN) ⇒ Object
1301 1302 1303 1304 1305 |
# File 'lib/daru/dataframe.rb', line 1301 def add_vectors_by_split(name,join='-',sep=Daru::SPLIT_TOKEN) self[name] .split_by_separator(sep) .each { |k,v| self["#{name}#{join}#{k}".to_sym] = v } end |
#add_vectors_by_split_recode(nm, join = '-', sep = Daru::SPLIT_TOKEN) ⇒ Object
1998 1999 2000 2001 2002 2003 2004 2005 |
# File 'lib/daru/dataframe.rb', line 1998 def add_vectors_by_split_recode(nm, join='-', sep=Daru::SPLIT_TOKEN) self[nm] .split_by_separator(sep) .each_with_index do |(k, v), i| v.rename "#{nm}:#{k}" self["#{nm}#{join}#{i + 1}".to_sym] = v end end |
#aggregate(options = {}, multi_index_level = -1)) ⇒ Daru::DataFrame
Function to use for aggregating the data.
Note: ‘GroupBy` class `aggregate` method uses this `aggregate` method internally.
2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 |
# File 'lib/daru/dataframe.rb', line 2425 def aggregate(={}, multi_index_level=-1) if block_given? positions_tuples, new_index = yield(@index) # note: use of yield is private for now else positions_tuples, new_index = group_index_for_aggregation(@index, multi_index_level) end colmn_value = aggregate_by_positions_tuples(, positions_tuples) Daru::DataFrame.new(colmn_value, index: new_index, order: .keys) end |
#all?(axis = :vector, &block) ⇒ Boolean
Works like Array#all?
1358 1359 1360 1361 1362 1363 1364 1365 1366 |
# File 'lib/daru/dataframe.rb', line 1358 def all? axis=:vector, &block if %i[vector column].include?(axis) @data.all?(&block) elsif axis == :row each_row.all?(&block) else raise ArgumentError, "Unidentified axis #{axis}" end end |
#any?(axis = :vector, &block) ⇒ Boolean
Works like Array#any?.
1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 |
# File 'lib/daru/dataframe.rb', line 1336 def any? axis=:vector, &block if %i[vector column].include?(axis) @data.any?(&block) elsif axis == :row each_row do |row| return true if yield(row) end false else raise ArgumentError, "Unidentified axis #{axis}" end end |
#apply_method(method, keys: nil, by_position: true) ⇒ Object Also known as: apply_method_on_sub_df
1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 |
# File 'lib/daru/dataframe.rb', line 1013 def apply_method(method, keys: nil, by_position: true) df = keys ? get_sub_dataframe(keys, by_position: by_position) : self case method when Symbol then df.send(method) when Proc then method.call(df) when Array then method.map(&:to_proc).map { |proc| proc.call(df) } # works with Array of both Symbol and/or Proc else raise end end |
#at(*positions) ⇒ Daru::Vector, Daru::DataFrame
Retrive vectors by positions
470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 |
# File 'lib/daru/dataframe.rb', line 470 def at *positions if AXES.include? positions.last axis = positions.pop return row_at(*positions) if axis == :row end original_positions = positions positions = coerce_positions(*positions, ncols) validate_positions(*positions, ncols) if positions.is_a? Integer @data[positions].dup else Daru::DataFrame.new positions.map { |pos| @data[pos].dup }, index: @index, order: @vectors.at(*original_positions), name: @name end end |
#bootstrap(n = nil) ⇒ Daru::DataFrame
Creates a DataFrame with the random data, of n size. If n not given, uses original number of rows.
1107 1108 1109 1110 1111 1112 1113 1114 1115 |
# File 'lib/daru/dataframe.rb', line 1107 def bootstrap(n=nil) n ||= nrows Daru::DataFrame.new({}, order: @vectors).tap do |df_boot| n.times do df_boot.add_row(row[rand(n)]) end df_boot.update end end |
#clone(*vectors_to_clone) ⇒ Object
Returns a ‘view’ of the DataFrame, i.e the object ID’s of vectors are preserved.
Arguments
vectors_to_clone
- Names of vectors to clone. Optional. Will return a view of the whole data frame otherwise.
598 599 600 601 602 603 604 |
# File 'lib/daru/dataframe.rb', line 598 def clone *vectors_to_clone vectors_to_clone.flatten! if ArrayHelper.array_of?(vectors_to_clone, Array) vectors_to_clone = @vectors.to_a if vectors_to_clone.empty? h = vectors_to_clone.map { |vec| [vec, self[vec]] }.to_h Daru::DataFrame.new(h, clone: false, order: vectors_to_clone, name: @name) end |
#clone_only_valid ⇒ Object
Returns a ‘shallow’ copy of DataFrame if missing data is not present, or a full copy of only valid data if missing data is present.
608 609 610 611 612 613 614 |
# File 'lib/daru/dataframe.rb', line 608 def clone_only_valid if include_values?(*Daru::MISSING_VALUES) reject_values(*Daru::MISSING_VALUES) else clone end end |
#clone_structure ⇒ Object
Only clone the structure of the DataFrame.
587 588 589 |
# File 'lib/daru/dataframe.rb', line 587 def clone_structure Daru::DataFrame.new([], order: @vectors.dup, index: @index.dup, name: @name) end |
#collect(axis = :vector, &block) ⇒ Object
Iterate over a row or vector and return results in a Daru::Vector. Specify axis with :vector or :row. Default to :vector.
Description
The #collect iterator works similar to #map, the only difference being that it returns a Daru::Vector comprising of the results of each block run. The resultant Vector has the same index as that of the axis over which collect has iterated. It also accepts the optional axis argument.
Arguments
-
axis
- The axis to iterate over. Can be :vector (or :column)
or :row. Default to :vector.
852 853 854 |
# File 'lib/daru/dataframe.rb', line 852 def collect axis=:vector, &block dispatch_to_axis_pl axis, :collect, &block end |
#collect_matrix ⇒ ::Matrix
Generate a matrix, based on vector names of the DataFrame.
:nocov: FIXME: Even not trying to cover this: I can’t get, how it is expected to work.… – zverok
1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 |
# File 'lib/daru/dataframe.rb', line 1059 def collect_matrix return to_enum(:collect_matrix) unless block_given? vecs = vectors.to_a rows = vecs.collect { |row| vecs.collect { |col| yield row,col } } Matrix.rows(rows) end |
#collect_row_with_index(&block) ⇒ Object
1033 1034 1035 1036 1037 |
# File 'lib/daru/dataframe.rb', line 1033 def collect_row_with_index &block return to_enum(:collect_row_with_index) unless block_given? Daru::Vector.new(each_row_with_index.map(&block), index: @index) end |
#collect_rows(&block) ⇒ Object
Retrieves a Daru::Vector, based on the result of calculation performed on each row.
1027 1028 1029 1030 1031 |
# File 'lib/daru/dataframe.rb', line 1027 def collect_rows &block return to_enum(:collect_rows) unless block_given? Daru::Vector.new(each_row.map(&block), index: @index) end |
#collect_vector_with_index(&block) ⇒ Object
1047 1048 1049 1050 1051 |
# File 'lib/daru/dataframe.rb', line 1047 def collect_vector_with_index &block return to_enum(:collect_vector_with_index) unless block_given? Daru::Vector.new(each_vector_with_index.map(&block), index: @vectors) end |
#collect_vectors(&block) ⇒ Object
Retrives a Daru::Vector, based on the result of calculation performed on each vector.
1041 1042 1043 1044 1045 |
# File 'lib/daru/dataframe.rb', line 1041 def collect_vectors &block return to_enum(:collect_vectors) unless block_given? Daru::Vector.new(each_vector.map(&block), index: @vectors) end |
#compute(text, &block) ⇒ Object
Returns a vector, based on a string with a calculation based on vector.
The calculation will be eval’ed, so you can put any variable or expression valid on ruby.
For example:
a = Daru::Vector.new [1,2]
b = Daru::Vector.new [3,4]
ds = Daru::DataFrame.new({:a => a,:b => b})
ds.compute("a+b")
=> Vector [4,6]
1226 1227 1228 1229 |
# File 'lib/daru/dataframe.rb', line 1226 def compute text, &block return instance_eval(&block) if block_given? instance_eval(text) end |
#concat(other_df) ⇒ Object
Concatenate another DataFrame along corresponding columns. If columns do not exist in both dataframes, they are filled with nils
1513 1514 1515 1516 1517 1518 1519 1520 1521 |
# File 'lib/daru/dataframe.rb', line 1513 def concat other_df vectors = (@vectors.to_a + other_df.vectors.to_a).uniq data = vectors.map do |v| get_vector_anyways(v).dup.concat(other_df.get_vector_anyways(v)) end Daru::DataFrame.new(data, order: vectors) end |
#create_sql(table, charset = 'UTF8') ⇒ Object
Create a sql, basen on a given Dataset
Arguments
-
table - String specifying name of the table that will created in SQL.
-
charset - Character set. Default is “UTF8”.
2023 2024 2025 2026 2027 2028 2029 2030 2031 |
# File 'lib/daru/dataframe.rb', line 2023 def create_sql(table,charset='UTF8') sql = "CREATE TABLE #{table} (" fields = vectors.to_a.collect do |f| v = self[f] f.to_s + ' ' + v.db_type end sql + fields.join(",\n ")+") CHARACTER SET=#{charset};" end |
#delete_row(index) ⇒ Object
Delete a row
1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 |
# File 'lib/daru/dataframe.rb', line 1091 def delete_row index idx = named_index_for index raise IndexError, "Index #{index} does not exist." unless @index.include? idx @index = Daru::Index.new(@index.to_a - [idx]) each_vector do |vector| vector.delete_at idx end set_size end |
#delete_vector(vector) ⇒ Object
Delete a vector
1074 1075 1076 1077 1078 1079 1080 1081 |
# File 'lib/daru/dataframe.rb', line 1074 def delete_vector vector raise IndexError, "Vector #{vector} does not exist." unless @vectors.include?(vector) @data.delete_at @vectors[vector] @vectors = Daru::Index.new @vectors.to_a - [vector] self end |
#delete_vectors(*vectors) ⇒ Object
Deletes a list of vectors
1084 1085 1086 1087 1088 |
# File 'lib/daru/dataframe.rb', line 1084 def delete_vectors *vectors Array(vectors).each { |vec| delete_vector vec } self end |
#dup(vectors_to_dup = nil) ⇒ Object
Duplicate the DataFrame entirely.
Arguments
-
vectors_to_dup
- An Array specifying the names of Vectors to
be duplicated. Will duplicate the entire DataFrame if not specified.
577 578 579 580 581 582 583 584 |
# File 'lib/daru/dataframe.rb', line 577 def dup vectors_to_dup=nil vectors_to_dup = @vectors.to_a unless vectors_to_dup src = vectors_to_dup.map { |vec| @data[@vectors.pos(vec)].dup } new_order = Daru::Index.new(vectors_to_dup) Daru::DataFrame.new src, order: new_order, index: @index.dup, name: @name, clone: true end |
#dup_only_valid(vecs = nil) ⇒ Object
Creates a new duplicate dataframe containing only rows without a single missing value.
618 619 620 621 622 623 624 625 |
# File 'lib/daru/dataframe.rb', line 618 def dup_only_valid vecs=nil rows_with_nil = @data.map { |vec| vec.indexes(*Daru::MISSING_VALUES) } .inject(&:concat) .uniq row_indexes = @index.to_a (vecs.nil? ? self : dup(vecs)).row[*(row_indexes - rows_with_nil)] end |
#each(axis = :vector, &block) ⇒ Object
Iterate over each row or vector of the DataFrame. Specify axis by passing :vector or :row as the argument. Default to :vector.
Description
‘#each` works exactly like Array#each. The default mode for `each` is to iterate over the columns of the DataFrame. To iterate over rows you must pass the axis, i.e `:row` as an argument.
Arguments
-
axis
- The axis to iterate over. Can be :vector (or :column)
or :row. Default to :vector.
833 834 835 |
# File 'lib/daru/dataframe.rb', line 833 def each axis=:vector, &block dispatch_to_axis axis, :each, &block end |
#each_index(&block) ⇒ Object
Iterate over each index of the DataFrame.
767 768 769 770 771 772 773 |
# File 'lib/daru/dataframe.rb', line 767 def each_index &block return to_enum(:each_index) unless block_given? @index.each(&block) self end |
#each_row ⇒ Object
Iterate over each row
800 801 802 803 804 805 806 807 808 |
# File 'lib/daru/dataframe.rb', line 800 def each_row return to_enum(:each_row) unless block_given? @index.size.times do |pos| yield row_at(pos) end self end |
#each_row_with_index ⇒ Object
810 811 812 813 814 815 816 817 818 |
# File 'lib/daru/dataframe.rb', line 810 def each_row_with_index return to_enum(:each_row_with_index) unless block_given? @index.each do |index| yield access_row(index), index end self end |
#each_vector(&block) ⇒ Object Also known as: each_column
Iterate over each vector
776 777 778 779 780 781 782 |
# File 'lib/daru/dataframe.rb', line 776 def each_vector(&block) return to_enum(:each_vector) unless block_given? @data.each(&block) self end |
#each_vector_with_index ⇒ Object Also known as: each_column_with_index
Iterate over each vector alongwith the name of the vector
787 788 789 790 791 792 793 794 795 |
# File 'lib/daru/dataframe.rb', line 787 def each_vector_with_index return to_enum(:each_vector_with_index) unless block_given? @vectors.each do |vector| yield @data[@vectors[vector]], vector end self end |
#filter(axis = :vector, &block) ⇒ Object
Retain vectors or rows if the block returns a truthy value.
Description
For filtering out certain rows/vectors based on their values, use the #filter method. By default it iterates over vectors and keeps those vectors for which the block returns true. It accepts an optional axis argument which lets you specify whether you want to iterate over vectors or rows.
Arguments
-
axis
- The axis to map over. Can be :vector (or :column) or :row.
Default to :vector.
Usage
# Filter vectors
df.filter do |vector|
vector.type == :numeric and vector.median < 50
end
# Filter rows
df.filter(:row) do |row|
row[:a] + row[:d] < 100
end
941 942 943 |
# File 'lib/daru/dataframe.rb', line 941 def filter axis=:vector, &block dispatch_to_axis_pl axis, :filter, &block end |
#filter_rows ⇒ Object
Iterates over each row and retains it in a new DataFrame if the block returns true for that row.
1136 1137 1138 1139 1140 1141 1142 |
# File 'lib/daru/dataframe.rb', line 1136 def filter_rows return to_enum(:filter_rows) unless block_given? keep_rows = @index.map { |index| yield access_row(index) } where keep_rows end |
#filter_vector(vec, &block) ⇒ Object
creates a new vector with the data of a given field which the block returns true
1130 1131 1132 |
# File 'lib/daru/dataframe.rb', line 1130 def filter_vector vec, &block Daru::Vector.new(each_row.select(&block).map { |row| row[vec] }) end |
#filter_vectors(&block) ⇒ Object
Iterates over each vector and retains it in a new DataFrame if the block returns true for that vector.
1146 1147 1148 1149 1150 |
# File 'lib/daru/dataframe.rb', line 1146 def filter_vectors &block return to_enum(:filter_vectors) unless block_given? dup.tap { |df| df.keep_vector_if(&block) } end |
#get_sub_dataframe(keys, by_position: true) ⇒ Daru::Dataframe
Extract a dataframe given row indexes or positions
560 561 562 563 564 565 566 567 568 569 |
# File 'lib/daru/dataframe.rb', line 560 def get_sub_dataframe(keys, by_position: true) return Daru::DataFrame.new({}) if keys == [] keys = @index.pos(*keys) unless by_position sub_df = row_at(*keys) sub_df = sub_df.to_df.transpose if sub_df.is_a?(Daru::Vector) sub_df end |
#get_vector_anyways(v) ⇒ Object
1507 1508 1509 |
# File 'lib/daru/dataframe.rb', line 1507 def get_vector_anyways(v) @vectors.include?(v) ? self[v].to_a : [nil] * size end |
#group_by(*vectors) ⇒ Object
Group elements by vector to perform operations on them. Returns a Daru::Core::GroupBy object.See the Daru::Core::GroupBy docs for a detailed list of possible operations.
Arguments
-
vectors - An Array contatining names of vectors to group by.
Usage
df = Daru::DataFrame.new({
a: %w{foo bar foo bar foo bar foo foo},
b: %w{one one two three two two one three},
c: [1 ,2 ,3 ,1 ,3 ,6 ,3 ,8],
d: [11 ,22 ,33 ,44 ,55 ,66 ,77 ,88]
})
df.group_by([:a,:b,:c]).groups
#=> {["bar", "one", 2]=>[1],
# ["bar", "three", 1]=>[3],
# ["bar", "two", 6]=>[5],
# ["foo", "one", 1]=>[0],
# ["foo", "one", 3]=>[6],
# ["foo", "three", 8]=>[7],
# ["foo", "two", 3]=>[2, 4]}
1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 |
# File 'lib/daru/dataframe.rb', line 1483 def group_by *vectors vectors.flatten! missing = vectors - @vectors.to_a unless missing.empty? raise(ArgumentError, "Vector(s) missing: #{missing.join(', ')}") end vectors = [@vectors.first] if vectors.empty? Daru::Core::GroupBy.new(self, vectors) end |
#group_by_and_aggregate(*group_by_keys, **aggregation_map) ⇒ Object
2437 2438 2439 |
# File 'lib/daru/dataframe.rb', line 2437 def group_by_and_aggregate(*group_by_keys, **aggregation_map) group_by(*group_by_keys).aggregate(aggregation_map) end |
#has_missing_data? ⇒ Boolean Also known as: flawed?
1248 1249 1250 |
# File 'lib/daru/dataframe.rb', line 1248 def has_missing_data? @data.any? { |vec| vec.include_values?(*Daru::MISSING_VALUES) } end |
#has_vector?(vector) ⇒ Boolean
Check if a vector is present
1323 1324 1325 |
# File 'lib/daru/dataframe.rb', line 1323 def has_vector? vector @vectors.include? vector end |
#head(quantity = 10) ⇒ Object Also known as: first
The first ten elements of the DataFrame
1371 1372 1373 |
# File 'lib/daru/dataframe.rb', line 1371 def head quantity=10 row.at 0..(quantity-1) end |
#include_values?(*values) ⇒ true, false
Check if any of given values occur in the data frame
1267 1268 1269 |
# File 'lib/daru/dataframe.rb', line 1267 def include_values?(*values) @data.any? { |vec| vec.include_values?(*values) } end |
#inspect(spacing = 10, threshold = 15) ⇒ Object
Pretty print in a nice table format for the command line (irb/pry/iruby)
2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 |
# File 'lib/daru/dataframe.rb', line 2232 def inspect spacing=10, threshold=15 name_part = @name ? ": #{@name} " : '' "#<#{self.class}#{name_part}(#{nrows}x#{ncols})>\n" + Formatters::Table.format( each_row.lazy, row_headers: row_headers, headers: headers, threshold: threshold, spacing: spacing ) end |
#interact_code(vector_names, full) ⇒ Object
2294 2295 2296 2297 2298 2299 2300 2301 2302 |
# File 'lib/daru/dataframe.rb', line 2294 def interact_code vector_names, full dfs = vector_names.zip(full).map do |vec_name, f| self[vec_name].contrast_code(full: f).each.to_a end all_vectors = recursive_product(dfs) Daru::DataFrame.new all_vectors, order: all_vectors.map(&:name) end |
#join(other_df, opts = {}) ⇒ Daru::DataFrame
Join 2 DataFrames with SQL style joins. Currently supports inner, left outer, right outer and full outer joins.
1946 1947 1948 |
# File 'lib/daru/dataframe.rb', line 1946 def join(other_df,opts={}) Daru::Core::Merge.join(self, other_df, opts) end |
#keep_row_if ⇒ Object
1117 1118 1119 1120 1121 |
# File 'lib/daru/dataframe.rb', line 1117 def keep_row_if @index .reject { |idx| yield access_row(idx) } .each { |idx| delete_row idx } end |
#keep_vector_if ⇒ Object
1123 1124 1125 1126 1127 |
# File 'lib/daru/dataframe.rb', line 1123 def keep_vector_if @vectors.each do |vector| delete_vector(vector) unless yield(@data[@vectors[vector]], vector) end end |
#map(axis = :vector, &block) ⇒ Object
Map over each vector or row of the data frame according to the argument specified. Will return an Array of the resulting elements. To map over each row/vector and get a DataFrame, see #recode.
Description
The #map iterator works like Array#map. The value returned by each run of the block is added to an Array and the Array is returned. This method also accepts an axis argument, like #each. The default is :vector.
Arguments
-
axis
- The axis to map over. Can be :vector (or :column) or :row.
Default to :vector.
872 873 874 |
# File 'lib/daru/dataframe.rb', line 872 def map axis=:vector, &block dispatch_to_axis_pl axis, :map, &block end |
#map!(axis = :vector, &block) ⇒ Object
Destructive map. Modifies the DataFrame. Each run of the block must return a Daru::Vector. You can specify the axis to map over as the argument. Default to :vector.
Arguments
-
axis
- The axis to map over. Can be :vector (or :column) or :row.
Default to :vector.
884 885 886 887 888 889 890 |
# File 'lib/daru/dataframe.rb', line 884 def map! axis=:vector, &block if %i[vector column].include?(axis) map_vectors!(&block) elsif axis == :row map_rows!(&block) end end |
#map_rows(&block) ⇒ Object
Map each row
991 992 993 994 995 |
# File 'lib/daru/dataframe.rb', line 991 def map_rows &block return to_enum(:map_rows) unless block_given? each_row.map(&block) end |
#map_rows! ⇒ Object
1003 1004 1005 1006 1007 1008 1009 1010 1011 |
# File 'lib/daru/dataframe.rb', line 1003 def map_rows! return to_enum(:map_rows!) unless block_given? index.dup.each do |i| row[i] = should_be_vector!(yield(row[i])) end self end |
#map_rows_with_index(&block) ⇒ Object
997 998 999 1000 1001 |
# File 'lib/daru/dataframe.rb', line 997 def map_rows_with_index &block return to_enum(:map_rows_with_index) unless block_given? each_row_with_index.map(&block) end |
#map_vectors(&block) ⇒ Object
Map each vector and return an Array.
966 967 968 969 970 |
# File 'lib/daru/dataframe.rb', line 966 def map_vectors &block return to_enum(:map_vectors) unless block_given? @data.map(&block) end |
#map_vectors! ⇒ Object
Destructive form of #map_vectors
973 974 975 976 977 978 979 980 981 |
# File 'lib/daru/dataframe.rb', line 973 def map_vectors! return to_enum(:map_vectors!) unless block_given? vectors.dup.each do |n| self[n] = should_be_vector!(yield(self[n])) end self end |
#map_vectors_with_index(&block) ⇒ Object
Map vectors alongwith the index.
984 985 986 987 988 |
# File 'lib/daru/dataframe.rb', line 984 def map_vectors_with_index &block return to_enum(:map_vectors_with_index) unless block_given? each_vector_with_index.map(&block) end |
#merge(other_df) ⇒ Daru::DataFrame
Merge vectors from two DataFrames. In case of name collision, the vectors names are changed to x_1, x_2 .…
1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 |
# File 'lib/daru/dataframe.rb', line 1901 def merge other_df # rubocop:disable Metrics/AbcSize unless nrows == other_df.nrows raise ArgumentError, "Number of rows must be equal in this: #{nrows} and other: #{other_df.nrows}" end new_fields = (@vectors.to_a + other_df.vectors.to_a) new_fields = ArrayHelper.recode_repeated(new_fields) DataFrame.new({}, order: new_fields).tap do |df_new| (0...nrows).each do |i| df_new.add_row row[i].to_a + other_df.row[i].to_a end df_new.index = @index if @index == other_df.index df_new.update end end |
#missing_values_rows(missing_values = [nil]) ⇒ Object Also known as: vector_missing_values
Return a vector with the number of missing values in each row.
Arguments
-
missing_values
- An Array of the values that should be
treated as ‘missing’. The default missing value is nil.
1237 1238 1239 1240 1241 1242 1243 |
# File 'lib/daru/dataframe.rb', line 1237 def missing_values_rows missing_values=[nil] number_of_missing = each_row.map do |row| row.indexes(*missing_values).size end Daru::Vector.new number_of_missing, index: @index, name: "#{@name}_missing_rows" end |
#ncols ⇒ Object
The number of vectors
1318 1319 1320 |
# File 'lib/daru/dataframe.rb', line 1318 def ncols @vectors.size end |
#nest(*tree_keys, &_block) ⇒ Object
Return a nested hash using vector names as keys and an array constructed of hashes with other values. If block provided, is used to provide the values, with parameters row
of dataset, current
last hash on hierarchy and name
of the key to include
1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 |
# File 'lib/daru/dataframe.rb', line 1275 def nest *tree_keys, &_block tree_keys = tree_keys[0] if tree_keys[0].is_a? Array each_row.each_with_object({}) do |row, current| # Create tree *keys, last = tree_keys current = keys.inject(current) { |c, f| c[row[f]] ||= {} } name = row[last] if block_given? current[name] = yield(row, current, name) else current[name] ||= [] current[name].push(row.to_h.delete_if { |key,_value| tree_keys.include? key }) end end end |
#nrows ⇒ Object
The number of rows
1313 1314 1315 |
# File 'lib/daru/dataframe.rb', line 1313 def nrows @index.size end |
#numeric_vector_names ⇒ Object
1708 1709 1710 |
# File 'lib/daru/dataframe.rb', line 1708 def numeric_vector_names @vectors.select { |v| self[v].numeric? } end |
#numeric_vectors ⇒ Object
Return the indexes of all the numeric vectors. Will include vectors with nils alongwith numbers.
1701 1702 1703 1704 1705 1706 |
# File 'lib/daru/dataframe.rb', line 1701 def numeric_vectors # FIXME: Why _with_index ?.. each_vector_with_index .select { |vec, _i| vec.numeric? } .map(&:last) end |
#one_to_many(parent_fields, pattern) ⇒ Object
Creates a new dataset for one to many relations on a dataset, based on pattern of field names.
for example, you have a survey for number of children with this structure:
id, name, child_name_1, child_age_1, child_name_2, child_age_2
with
ds.one_to_many([:id], "child_%v_%n"
the field of first parameters will be copied verbatim to new dataset, and fields which responds to second pattern will be added one case for each different %n.
1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 |
# File 'lib/daru/dataframe.rb', line 1981 def one_to_many(parent_fields, pattern) vars, numbers = one_to_many_components(pattern) DataFrame.new([], order: [*parent_fields, '_col_id', *vars]).tap do |ds| each_row do |row| verbatim = parent_fields.map { |f| [f, row[f]] }.to_h numbers.each do |n| generated = one_to_many_row row, n, vars, pattern next if generated.values.all?(&:nil?) ds.add_row(verbatim.merge(generated).merge('_col_id' => n)) end end ds.update end end |
#only_numerics(opts = {}) ⇒ Object
Return a DataFrame of only the numerical Vectors. If clone: false is specified as option, only a view of the Vectors will be returned. Defaults to clone: true.
1715 1716 1717 1718 1719 1720 1721 |
# File 'lib/daru/dataframe.rb', line 1715 def only_numerics opts={} cln = opts[:clone] == false ? false : true arry = numeric_vectors.map { |v| self[v] } order = Index.new(numeric_vectors) Daru::DataFrame.new(arry, clone: cln, order: order, index: @index) end |
#order=(order_array) ⇒ Object
Reorder the vectors in a dataframe
1208 1209 1210 1211 1212 |
# File 'lib/daru/dataframe.rb', line 1208 def order=(order_array) raise ArgumentError, 'Invalid order' unless order_array.sort == vectors.to_a.sort initialize(to_h, order: order_array) end |
#pivot_table(opts = {}) ⇒ Object
Pivots a data frame on specified vectors and applies an aggregate function to quickly generate a summary.
Options
:index
- Keys to group by on the pivot table row index. Pass vector names contained in an Array.
:vectors
- Keys to group by on the pivot table column index. Pass vector names contained in an Array.
:agg
- Function to aggregate the grouped values. Default to :mean. Can use any of the statistics functions applicable on Vectors that can be found in the Daru::Statistics::Vector module.
:values
- Columns to aggregate. Will consider all numeric columns not specified in :index or :vectors. Optional.
Usage
df = Daru::DataFrame.new({
a: ['foo' , 'foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'bar'],
b: ['one' , 'one', 'one', 'two', 'two', 'one', 'one', 'two', 'two'],
c: ['small','large','large','small','small','large','small','large','small'],
d: [1,2,2,3,3,4,5,6,7],
e: [2,4,4,6,6,8,10,12,14]
})
df.pivot_table(index: [:a], vectors: [:b], agg: :sum, values: :e)
#=>
# #<Daru::DataFrame:88342020 @name = 08cdaf4e-b154-4186-9084-e76dd191b2c9 @size = 2>
# [:e, :one] [:e, :two]
# [:bar] 18 26
# [:foo] 10 12
1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 |
# File 'lib/daru/dataframe.rb', line 1880 def pivot_table opts={} raise ArgumentError, 'Specify grouping index' if Array(opts[:index]).empty? index = opts[:index] vectors = opts[:vectors] || [] aggregate_function = opts[:agg] || :mean values = prepare_pivot_values index, vectors, opts raise IndexError, 'No numeric vectors to aggregate' if values.empty? grouped = group_by(index) return grouped.send(aggregate_function) if vectors.empty? super_hash = make_pivot_hash grouped, vectors, values, aggregate_function pivot_dataframe super_hash end |
#plot(*args, **options, &b) ⇒ Object
this method is overwritten: see Daru::DataFrame#plotting_library=
381 382 383 384 385 |
# File 'lib/daru/dataframe.rb', line 381 def plot(*args, **, &b) init_plotting_library plot(*args, **, &b) end |
#plotting_library=(lib) ⇒ Object
365 366 367 368 369 370 371 372 373 374 375 376 377 378 |
# File 'lib/daru/dataframe.rb', line 365 def plotting_library= lib case lib when :gruff, :nyaplot @plotting_library = lib if Daru.send("has_#{lib}?".to_sym) extend Module.const_get( "Daru::Plotting::DataFrame::#{lib.to_s.capitalize}Library" ) end else raise ArgumentError, "Plotting library #{lib} not supported. "\ 'Supported libraries are :nyaplot and :gruff' end end |
#recast(opts = {}) ⇒ Object
2214 2215 2216 2217 2218 |
# File 'lib/daru/dataframe.rb', line 2214 def recast opts={} opts.each do |vector_name, dtype| self[vector_name].cast(dtype: dtype) end end |
#recode(axis = :vector, &block) ⇒ Object
Maps over the DataFrame and returns a DataFrame. Each run of the block must return a Daru::Vector object. You can specify the axis to map over. Default to :vector.
Description
Recode works similarly to #map, but an important difference between the two is that recode returns a modified Daru::DataFrame instead of an Array. For this reason, #recode expects that every run of the block to return a Daru::Vector.
Just like map and each, recode also accepts an optional axis argument.
Arguments
-
axis
- The axis to map over. Can be :vector (or :column) or :row.
Default to :vector.
909 910 911 |
# File 'lib/daru/dataframe.rb', line 909 def recode axis=:vector, &block dispatch_to_axis_pl axis, :recode, &block end |
#recode_rows ⇒ Object
955 956 957 958 959 960 961 962 963 |
# File 'lib/daru/dataframe.rb', line 955 def recode_rows block_given? or return to_enum(:recode_rows) dup.tap do |df| df.each_row_with_index do |r, i| df.row[i] = should_be_vector!(yield(r)) end end end |
#recode_vectors ⇒ Object
945 946 947 948 949 950 951 952 953 |
# File 'lib/daru/dataframe.rb', line 945 def recode_vectors block_given? or return to_enum(:recode_vectors) dup.tap do |df| df.each_vector_with_index do |v, i| df[*i] = should_be_vector!(yield(v)) end end end |
#reindex(new_index) ⇒ Object
Change the index of the DataFrame and preserve the labels of the previous indexing. New index can be Daru::Index or any of its subclasses.
1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 |
# File 'lib/daru/dataframe.rb', line 1607 def reindex new_index unless new_index.is_a?(Daru::Index) raise ArgumentError, 'Must pass the new index of type Index or its '\ "subclasses, not #{new_index.class}" end cl = Daru::DataFrame.new({}, order: @vectors, index: new_index, name: @name) new_index.each_with_object(cl) do |idx, memo| memo.row[idx] = @index.include?(idx) ? row[idx] : [nil]*ncols end end |
#reindex_vectors(new_vectors) ⇒ Object
1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 |
# File 'lib/daru/dataframe.rb', line 1495 def reindex_vectors new_vectors unless new_vectors.is_a?(Daru::Index) raise ArgumentError, 'Must pass the new index of type Index or its '\ "subclasses, not #{new_vectors.class}" end cl = Daru::DataFrame.new({}, order: new_vectors, index: @index, name: @name) new_vectors.each_with_object(cl) do |vec, memo| memo[vec] = @vectors.include?(vec) ? self[vec] : [nil]*nrows end end |
#reject_values(*values) ⇒ Daru::DataFrame
Returns a dataframe in which rows with any of the mentioned values are ignored.
644 645 646 647 648 649 650 651 652 653 654 |
# File 'lib/daru/dataframe.rb', line 644 def reject_values(*values) positions = size.times.to_a - @data.flat_map { |vec| vec.positions(*values) } # Handle the case when positions size is 1 and #row_at wouldn't return a df if positions.size == 1 pos = positions.first row_at(pos..pos) else row_at(*positions) end end |
#rename(new_name) ⇒ Object Also known as: name=
Rename the DataFrame.
2139 2140 2141 2142 |
# File 'lib/daru/dataframe.rb', line 2139 def rename new_name @name = new_name self end |
#rename_vectors(name_map) ⇒ Object
Renames the vectors
Arguments
-
name_map - A hash where the keys are the exising vector names and
the values are the new names. If a vector is renamed to a vector name that is already in use, the existing one is overwritten.
Usage
df = Daru::DataFrame.new({ a: [1,2,3,4], b: [:a,:b,:c,:d], c: [11,22,33,44] })
df.rename_vectors :a => :alpha, :c => :gamma
df.vectors.to_a #=> [:alpha, :b, :gamma]
1691 1692 1693 1694 1695 1696 1697 |
# File 'lib/daru/dataframe.rb', line 1691 def rename_vectors name_map existing_targets = name_map.reject { |k,v| k == v }.values & vectors.to_a delete_vectors(*existing_targets) new_names = vectors.to_a.map { |v| name_map[v] ? name_map[v] : v } self.vectors = Daru::Index.new new_names end |
#replace_values(old_values, new_value) ⇒ Daru::DataFrame
Replace specified values with given value
678 679 680 681 |
# File 'lib/daru/dataframe.rb', line 678 def replace_values old_values, new_value @data.each { |vec| vec.replace_values old_values, new_value } self end |
#reset_index ⇒ Object
1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 |
# File 'lib/daru/dataframe.rb', line 1619 def reset_index index_df = index.to_df names = index.name names = [names] unless names.instance_of?(Array) new_vectors = names + vectors.to_a self.index = index_df.index names.each do |name| self[name] = index_df[name] end self.order = new_vectors self end |
#respond_to_missing?(name, include_private = false) ⇒ Boolean
2290 2291 2292 |
# File 'lib/daru/dataframe.rb', line 2290 def respond_to_missing?(name, include_private=false) name.to_s.end_with?('=') || has_vector?(name) || super end |
#rolling_fillna(direction = :forward) ⇒ Object
723 724 725 |
# File 'lib/daru/dataframe.rb', line 723 def rolling_fillna(direction=:forward) dup.rolling_fillna!(direction) end |
#rolling_fillna!(direction = :forward) ⇒ Object
Rolling fillna replace all Float::NAN and NIL values with the preceeding or following value
718 719 720 721 |
# File 'lib/daru/dataframe.rb', line 718 def rolling_fillna!(direction=:forward) @data.each { |vec| vec.rolling_fillna!(direction) } self end |
#row ⇒ Object
Access a row or set/create a row. Refer #[] and #[]= docs for details.
Usage
df.row[:a] # access row named ':a'
df.row[:b] = [1,2,3] # set row ':b' to [1,2,3]
553 554 555 |
# File 'lib/daru/dataframe.rb', line 553 def row Daru::Accessors::DataFrameByRow.new(self) end |
#row_at(*positions) ⇒ Daru::Vector, Daru::DataFrame
Retrive rows by positions
408 409 410 411 412 413 414 415 416 417 418 419 420 |
# File 'lib/daru/dataframe.rb', line 408 def row_at *positions original_positions = positions positions = coerce_positions(*positions, nrows) validate_positions(*positions, nrows) if positions.is_a? Integer row = get_rows_for([positions]) Daru::Vector.new row, index: @vectors else new_rows = get_rows_for(original_positions) Daru::DataFrame.new new_rows, index: @index.at(*original_positions), order: @vectors end end |
#save(filename) ⇒ Object
Use marshalling to save dataframe to a file.
2188 2189 2190 |
# File 'lib/daru/dataframe.rb', line 2188 def save filename Daru::IO.save self, filename end |
#set_at(positions, vector) ⇒ Object
Set vectors by positions
505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 |
# File 'lib/daru/dataframe.rb', line 505 def set_at positions, vector if positions.last == :row positions.pop return set_row_at(positions, vector) end validate_positions(*positions, ncols) vector = if vector.is_a? Daru::Vector vector.reindex @index else Daru::Vector.new vector end raise SizeError, 'Vector length should match index length' if vector.size != @index.size positions.each { |pos| @data[pos] = vector } end |
#set_index(new_index_col, opts = {}) ⇒ Object
Set a particular column as the new DF
1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 |
# File 'lib/daru/dataframe.rb', line 1568 def set_index new_index_col, opts={} if new_index_col.respond_to?(:to_a) strategy = SetMultiIndexStrategy new_index_col = new_index_col.to_a else strategy = SetSingleIndexStrategy end uniq_size = strategy.uniq_size(self, new_index_col) raise ArgumentError, 'All elements in new index must be unique.' if @size != uniq_size self.index = strategy.new_index(self, new_index_col) strategy.delete_vector(self, new_index_col) unless opts[:keep] self end |
#set_row_at(positions, vector) ⇒ Object
Set rows by positions
437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 |
# File 'lib/daru/dataframe.rb', line 437 def set_row_at positions, vector validate_positions(*positions, nrows) vector = if vector.is_a? Daru::Vector vector.reindex @vectors else Daru::Vector.new vector end raise SizeError, 'Vector length should match row length' if vector.size != @vectors.size @data.each_with_index do |vec, pos| vec.set_at(positions, vector.at(pos)) end @index = @data[0].index set_size end |
#shape ⇒ Object
Return the number of rows and columns of the DataFrame in an Array.
1308 1309 1310 |
# File 'lib/daru/dataframe.rb', line 1308 def shape [nrows, ncols] end |
#sort(vector_order, opts = {}) ⇒ Object
Non-destructive version of #sort!
1842 1843 1844 |
# File 'lib/daru/dataframe.rb', line 1842 def sort vector_order, opts={} dup.sort! vector_order, opts end |
#sort!(vector_order, opts = {}) ⇒ Object
Sorts a dataframe (ascending/descending) in the given pripority sequence of vectors, with or without a block.
1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 |
# File 'lib/daru/dataframe.rb', line 1818 def sort! vector_order, opts={} raise ArgumentError, 'Required atleast one vector name' if vector_order.empty? # To enable sorting with categorical data, # map categories to integers preserving their order old = convert_categorical_vectors vector_order block = sort_prepare_block vector_order, opts order = @index.size.times.sort(&block) new_index = @index.reorder order # To reverse map mapping of categorical data to integers restore_categorical_vectors old @data.each do |vector| vector.reorder! order end self.index = new_index self end |
#split_by_category(cat_name) ⇒ Array
Split the dataframe into many dataframes based on category vector
2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 |
# File 'lib/daru/dataframe.rb', line 2322 def split_by_category cat_name cat_dv = self[cat_name] raise ArgumentError, "#{cat_name} is not a category vector" unless cat_dv.category? cat_dv.categories.map do |cat| where(cat_dv.eq cat) .rename(cat) .delete_vector cat_name end end |
#summary ⇒ String
Generate a summary of this DataFrame based on individual vectors in the DataFrame
1725 1726 1727 1728 1729 1730 1731 1732 1733 |
# File 'lib/daru/dataframe.rb', line 1725 def summary summary = "= #{name}" summary << "\n Number of rows: #{nrows}" @vectors.each do |v| summary << "\n Element:[#{v}]\n" summary << self[v].summary(1) end summary end |
#tail(quantity = 10) ⇒ Object Also known as: last
The last ten elements of the DataFrame
1380 1381 1382 1383 |
# File 'lib/daru/dataframe.rb', line 1380 def tail quantity=10 start = [-quantity, -size].max row.at start..-1 end |
#to_a ⇒ Object
Converts the DataFrame into an array of hashes where key is vector name and value is the corresponding element. The 0th index of the array contains the array of hashes while the 1th index contains the indexes of each row of the dataframe. Each element in the index array corresponds to its row in the array of hashes, which has the same index.
2071 2072 2073 |
# File 'lib/daru/dataframe.rb', line 2071 def to_a [each_row.map(&:to_h), @index.to_a] end |
#to_category(*names) ⇒ Daru::DataFrame
Converts the specified non category type vectors to category type vectors
2270 2271 2272 2273 |
# File 'lib/daru/dataframe.rb', line 2270 def to_category *names names.each { |n| self[n] = self[n].to_category } self end |
#to_df ⇒ self
Returns the dataframe. This can be convenient when the user does not know whether the object is a vector or a dataframe.
2036 2037 2038 |
# File 'lib/daru/dataframe.rb', line 2036 def to_df self end |
#to_gsl ⇒ Object
Convert all numeric vectors to GSL::Matrix
2041 2042 2043 2044 2045 |
# File 'lib/daru/dataframe.rb', line 2041 def to_gsl numerics_as_arrays = numeric_vectors.map { |n| self[n].to_a } GSL::Matrix.alloc(*numerics_as_arrays.transpose) end |
#to_h ⇒ Object
Converts DataFrame to a hash (explicit) with keys as vector names and values as the corresponding vectors.
2087 2088 2089 2090 2091 |
# File 'lib/daru/dataframe.rb', line 2087 def to_h @vectors .each_with_index .map { |vec_name, idx| [vec_name, @data[idx]] }.to_h end |
#to_html(threshold = 30) ⇒ Object
Convert to html for IRuby.
2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 |
# File 'lib/daru/dataframe.rb', line 2094 def to_html(threshold=30) table_thead = to_html_thead table_tbody = to_html_tbody(threshold) path = if index.is_a?(MultiIndex) File.('../iruby/templates/dataframe_mi.html.erb', __FILE__) else File.('../iruby/templates/dataframe.html.erb', __FILE__) end ERB.new(File.read(path).strip).result(binding) end |
#to_html_tbody(threshold = 30) ⇒ Object
2115 2116 2117 2118 2119 2120 2121 2122 2123 |
# File 'lib/daru/dataframe.rb', line 2115 def to_html_tbody(threshold=30) table_tbody_path = if index.is_a?(MultiIndex) File.('../iruby/templates/dataframe_mi_tbody.html.erb', __FILE__) else File.('../iruby/templates/dataframe_tbody.html.erb', __FILE__) end ERB.new(File.read(table_tbody_path).strip).result(binding) end |
#to_html_thead ⇒ Object
2105 2106 2107 2108 2109 2110 2111 2112 2113 |
# File 'lib/daru/dataframe.rb', line 2105 def to_html_thead table_thead_path = if index.is_a?(MultiIndex) File.('../iruby/templates/dataframe_mi_thead.html.erb', __FILE__) else File.('../iruby/templates/dataframe_thead.html.erb', __FILE__) end ERB.new(File.read(table_thead_path).strip).result(binding) end |
#to_json(no_index = true) ⇒ Object
Convert to json. If no_index is false then the index will NOT be included in the JSON thus created.
2077 2078 2079 2080 2081 2082 2083 |
# File 'lib/daru/dataframe.rb', line 2077 def to_json no_index=true if no_index to_a[0].to_json else to_a.to_json end end |
#to_matrix ⇒ Object
Convert all vectors of type :numeric into a Matrix.
2048 2049 2050 |
# File 'lib/daru/dataframe.rb', line 2048 def to_matrix Matrix.columns each_vector.select(&:numeric?).map(&:to_a) end |
#to_nmatrix ⇒ Object
Convert all vectors of type :numeric and not containing nils into an NMatrix.
2060 2061 2062 2063 2064 |
# File 'lib/daru/dataframe.rb', line 2060 def to_nmatrix each_vector.select do |vector| vector.numeric? && !vector.include_values?(*Daru::MISSING_VALUES) end.map(&:to_a).transpose.to_nm end |
#to_nyaplotdf ⇒ Object
Return a Nyaplot::DataFrame from the data of this DataFrame. :nocov:
2054 2055 2056 |
# File 'lib/daru/dataframe.rb', line 2054 def to_nyaplotdf Nyaplot::DataFrame.new(to_a[0]) end |
#to_REXP ⇒ Object
rubocop:disable Style/MethodName
5 6 7 8 9 10 11 12 13 |
# File 'lib/daru/extensions/rserve.rb', line 5 def to_REXP # rubocop:disable Style/MethodName names = @vectors.to_a data = names.map do |f| Rserve::REXP::Wrapper.wrap(self[f].to_a) end l = Rserve::Rlist.new(data, names.map(&:to_s)) Rserve::REXP.create_data_frame(l) end |
#to_s ⇒ Object
2125 2126 2127 |
# File 'lib/daru/dataframe.rb', line 2125 def to_s "#<#{self.class}#{': ' + @name.to_s if @name}(#{nrows}x#{ncols})>" end |
#transpose ⇒ Object
Transpose a DataFrame, tranposing elements and row, column indexing.
2221 2222 2223 2224 2225 2226 2227 2228 2229 |
# File 'lib/daru/dataframe.rb', line 2221 def transpose Daru::DataFrame.new( each_vector.map(&:to_a).transpose, index: @vectors, order: @index, dtype: @dtype, name: @name ) end |
#union(other_df) ⇒ Object
Concatenates another DataFrame as #concat. Additionally it tries to preserve the index. If the indices contain common elements, #union will overwrite the according rows in the first dataframe.
1527 1528 1529 1530 1531 1532 1533 1534 |
# File 'lib/daru/dataframe.rb', line 1527 def union other_df index = (@index.to_a + other_df.index.to_a).uniq df = row[*(@index.to_a - other_df.index.to_a)] df = df.concat(other_df) df.index = Daru::Index.new(index) df end |
#uniq(*vtrs) ⇒ Object
Return unique rows by vector specified or all vectors
759 760 761 762 763 764 |
# File 'lib/daru/dataframe.rb', line 759 def uniq(*vtrs) vecs = vtrs.empty? ? vectors.to_a : Array(vtrs) grouped = group_by(vecs) indexes = grouped.groups.values.map { |v| v[0] }.sort row[*indexes] end |
#update ⇒ Object
Method for updating the metadata (i.e. missing value positions) of the after assingment/deletion etc. are complete. This is provided so that time is not wasted in creating the metadata for the vector each time assignment/deletion of elements is done. Updating data this way is called lazy loading. To set or unset lazy loading, see the .lazy_update= method.
2134 2135 2136 |
# File 'lib/daru/dataframe.rb', line 2134 def update @data.each(&:update) if Daru.lazy_update end |
#vector_by_calculation(&block) ⇒ Object
DSL for yielding each row and returning a Daru::Vector based on the value each run of the block returns.
Usage
a1 = Daru::Vector.new([1, 2, 3, 4, 5, 6, 7])
a2 = Daru::Vector.new([10, 20, 30, 40, 50, 60, 70])
a3 = Daru::Vector.new([100, 200, 300, 400, 500, 600, 700])
ds = Daru::DataFrame.new({ :a => a1, :b => a2, :c => a3 })
total = ds.vector_by_calculation { a + b + c }
# <Daru::Vector:82314050 @name = nil @size = 7 >
# nil
# 0 111
# 1 222
# 2 333
# 3 444
# 4 555
# 5 666
# 6 777
1188 1189 1190 1191 1192 |
# File 'lib/daru/dataframe.rb', line 1188 def vector_by_calculation &block a = each_row.map { |r| r.instance_eval(&block) } Daru::Vector.new a, index: @index end |
#vector_count_characters(vecs = nil) ⇒ Object
1293 1294 1295 1296 1297 1298 1299 |
# File 'lib/daru/dataframe.rb', line 1293 def vector_count_characters vecs=nil vecs ||= @vectors.to_a collect_rows do |row| vecs.map { |v| row[v].to_s.size }.inject(:+) end end |
#vector_mean(max_missing = 0) ⇒ Object
Calculate mean of the rows of the dataframe.
Arguments
-
max_missing
- The maximum number of elements in the row that can be
zero for the mean calculation to happen. Default to 0.
1449 1450 1451 1452 1453 1454 1455 1456 1457 |
# File 'lib/daru/dataframe.rb', line 1449 def vector_mean max_missing=0 # FIXME: in vector_sum we preserve created vector dtype, but # here we are not. Is this by design or ...? - zverok, 2016-05-18 mean_vec = Daru::Vector.new [0]*@size, index: @index, name: "mean_#{@name}" each_row_with_index.each_with_object(mean_vec) do |(row, i), memo| memo[i] = row.indexes(*Daru::MISSING_VALUES).size > max_missing ? nil : row.mean end end |
#vector_sum(*args) ⇒ Object
Sum all numeric/specified vectors in the DataFrame.
Returns a new vector that’s a containing a sum of all numeric or specified vectors of the DataFrame. By default, if the vector contains a nil, the sum is nil. With :skipnil argument set to true, nil values are assumed to be 0 (zero) and the sum vector is returned.
1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 |
# File 'lib/daru/dataframe.rb', line 1431 def vector_sum(*args) defaults = {vecs: nil, skipnil: false} = args.last.is_a?(::Hash) ? args.pop : {} = defaults.merge() vecs = args[0] || [:vecs] skipnil = args[1] || [:skipnil] vecs ||= numeric_vectors sum = Daru::Vector.new [0]*@size, index: @index, name: @name, dtype: @dtype vecs.inject(sum) { |memo, n| self[n].add(memo, skipnil: skipnil) } end |
#verify(*tests) ⇒ Object
Test each row with one or more tests. The function returns an array with all errors.
FIXME: description here is too sparse. As far as I can get, it should tell something about that each test is [descr, fields, block], and that first value may be column name to output. - zverok, 2016-05-18
1160 1161 1162 1163 1164 1165 1166 1167 |
# File 'lib/daru/dataframe.rb', line 1160 def verify(*tests) id = tests.first.is_a?(Symbol) ? tests.shift : @vectors.first each_row_with_index.map do |row, i| tests.reject { |*_, block| block.call(row) } .map { |test| row, test, id, i } end.flatten end |
#where(bool_array) ⇒ Object
Query a DataFrame by passing a Daru::Core::Query::BoolArray object.
2246 2247 2248 |
# File 'lib/daru/dataframe.rb', line 2246 def where bool_array Daru::Core::Query.df_where self, bool_array end |
#which(&block) ⇒ Object
15 16 17 |
# File 'lib/daru/extensions/which_dsl.rb', line 15 def which(&block) WhichQuery.new(self, &block).exec end |
#write_csv(filename, opts = {}) ⇒ Object
Write this DataFrame to a CSV file.
Arguments
-
filename - Path of CSV file where the DataFrame is to be saved.
Options
-
convert_comma - If set to true, will convert any commas in any
of the data to full stops (‘.’). All the options accepted by CSV.read() can also be passed into this function.
2158 2159 2160 |
# File 'lib/daru/dataframe.rb', line 2158 def write_csv filename, opts={} Daru::IO.dataframe_write_csv self, filename, opts end |
#write_excel(filename, opts = {}) ⇒ Object
Write this dataframe to an Excel Spreadsheet
Arguments
-
filename - The path of the file where the DataFrame should be written.
2167 2168 2169 |
# File 'lib/daru/dataframe.rb', line 2167 def write_excel filename, opts={} Daru::IO.dataframe_write_excel self, filename, opts end |
#write_sql(dbh, table) ⇒ Object
Insert each case of the Dataset on the selected table
Arguments
-
dbh - DBI database connection object.
-
query - Query string.
Usage
ds = Daru::DataFrame.new({:id=>Daru::Vector.new([1,2,3]), :name=>Daru::Vector.new(["a","b","c"])})
dbh = DBI.connect("DBI:Mysql:database:localhost", "user", "password")
ds.write_sql(dbh,"test")
2183 2184 2185 |
# File 'lib/daru/dataframe.rb', line 2183 def write_sql dbh, table Daru::IO.dataframe_write_sql self, dbh, table end |