Class: Daru::DataFrame

Inherits:

Object

Object
Daru::DataFrame

show all

Includes:: Maths::Arithmetic::DataFrame, Maths::Statistics::DataFrame, Plotting::DataFrame

Defined in:: lib/daru/dataframe.rb,
lib/daru/extensions/rserve.rb

Instance Attribute Summary collapse

#index ⇒ Object

The index of the rows of the DataFrame.
#name ⇒ Object readonly

The name of the DataFrame.
#size ⇒ Object readonly

The number of rows present in the DataFrame.
#vectors ⇒ Object

The vectors (columns) index of the DataFrame.

Class Method Summary collapse

._load(data) ⇒ Object
.crosstab_by_assignation(rows, columns, values) ⇒ Object

Generates a new dataset, using three vectors - Rows - Columns - Values.
.from_activerecord(relation, *fields) ⇒ Object

Read a dataframe from AR::Relation.
.from_csv(path, opts = {}, &block) ⇒ Object

Load data from a CSV file.
.from_excel(path, opts = {}, &block) ⇒ Object

Read data from an Excel file into a DataFrame.
.from_plaintext(path, fields) ⇒ Object

Read the database from a plaintext file.
.from_sql(dbh, query) ⇒ Object

Read a database query and returns a Dataset.
.rows(source, opts = {}) ⇒ Object

Create DataFrame by specifying rows as an Array of Arrays or Array of Daru::Vector objects.

Instance Method Summary collapse

#==(other) ⇒ Object
#[](*names) ⇒ Object

Access row or vector.
#[]=(*args) ⇒ Object

Insert a new row/vector of the specified name or modify a previous row.
#_dump(depth) ⇒ Object
#add_row(row, index = nil) ⇒ Object
#add_vector(n, vector) ⇒ Object
#add_vectors_by_split(name, join = '-', sep = Daru::SPLIT_TOKEN) ⇒ Object
#add_vectors_by_split_recode(name_, join = '-', sep = Daru::SPLIT_TOKEN) ⇒ Object
#all?(axis = :vector, &block) ⇒ Boolean

Works like Array#all?.
#any?(axis = :vector, &block) ⇒ Boolean

Works like Array#any?.
#bootstrap(n = nil) ⇒ Daru::DataFrame

Creates a DataFrame with the random data, of n size.
#clone(*vectors_to_clone) ⇒ Object

Returns a ‘view’ of the DataFrame, i.e the object ID’s of vectors are preserved.
#clone_only_valid ⇒ Object

Returns a ‘shallow’ copy of DataFrame if missing data is not present, or a full copy of only valid data if missing data is present.
#clone_structure ⇒ Object

Only clone the structure of the DataFrame.
#collect(axis = :vector, &block) ⇒ Object

Iterate over a row or vector and return results in a Daru::Vector.
#collect_matrix ⇒ ::Matrix

Generate a matrix, based on vector names of the DataFrame.
#collect_row_with_index(&block) ⇒ Object
#collect_rows(&block) ⇒ Object

Retrieves a Daru::Vector, based on the result of calculation performed on each row.
#collect_vector_with_index(&block) ⇒ Object
#collect_vectors(&block) ⇒ Object

Retrives a Daru::Vector, based on the result of calculation performed on each vector.
#column(name) ⇒ Object

Access a vector by name.
#compute(text, &block) ⇒ Object

Returns a vector, based on a string with a calculation based on vector.
#concat(other_df) ⇒ Object

Concatenate another DataFrame along corresponding columns.
#create_sql(table, charset = "UTF8") ⇒ Object

Create a sql, basen on a given Dataset.
#delete_row(index) ⇒ Object

Delete a row.
#delete_vector(vector) ⇒ Object

Delete a vector.
#dup(vectors_to_dup = nil) ⇒ Object

Duplicate the DataFrame entirely.
#dup_only_valid(vecs = nil) ⇒ Object

Creates a new duplicate dataframe containing only rows without a single missing value.
#each(axis = :vector, &block) ⇒ Object

Iterate over each row or vector of the DataFrame.
#each_index(&block) ⇒ Object

Iterate over each index of the DataFrame.
#each_row(&block) ⇒ Object

Iterate over each row.
#each_row_with_index(&block) ⇒ Object
#each_vector(&block) ⇒ Object (also: #each_column)

Iterate over each vector.
#each_vector_with_index(&block) ⇒ Object (also: #each_column_with_index)

Iterate over each vector alongwith the name of the vector.
#filter(axis = :vector, &block) ⇒ Object

Retain vectors or rows if the block returns a truthy value.
#filter_rows(&block) ⇒ Object

Iterates over each row and retains it in a new DataFrame if the block returns true for that row.
#filter_vector(vec) ⇒ Object

creates a new vector with the data of a given field which the block returns true.
#filter_vectors(&block) ⇒ Object

Iterates over each vector and retains it in a new DataFrame if the block returns true for that vector.
#group_by(*vectors) ⇒ Object

Group elements by vector to perform operations on them.
#has_missing_data? ⇒ Boolean (also: #flawed?)
#has_vector?(vector) ⇒ Boolean

Check if a vector is present.
#head(quantity = 10) ⇒ Object (also: #first)

The first ten elements of the DataFrame.
#initialize(source, opts = {}) ⇒ DataFrame constructor

DataFrame basically consists of an Array of Vector objects.
#inspect(spacing = 10, threshold = 15) ⇒ Object

Pretty print in a nice table format for the command line (irb/pry/iruby).
#join(other_df, opts = {}) ⇒ Daru::DataFrame

Join 2 DataFrames with SQL style joins.
#keep_row_if(&block) ⇒ Object
#keep_vector_if(&block) ⇒ Object
#map(axis = :vector, &block) ⇒ Object

Map over each vector or row of the data frame according to the argument specified.
#map!(axis = :vector, &block) ⇒ Object

Destructive map.
#map_rows(&block) ⇒ Object

Map each row.
#map_rows!(&block) ⇒ Object
#map_rows_with_index(&block) ⇒ Object
#map_vectors(&block) ⇒ Object

Map each vector and return an Array.
#map_vectors!(&block) ⇒ Object

Destructive form of #map_vectors.
#map_vectors_with_index(&block) ⇒ Object

Map vectors alongwith the index.
#merge(other_df) ⇒ Daru::DataFrame

Merge vectors from two DataFrames.
#method_missing(name, *args, &block) ⇒ Object
#missing_values_rows(missing_values = [nil]) ⇒ Object (also: #vector_missing_values)

Return a vector with the number of missing values in each row.
#ncols ⇒ Object

The number of vectors.
#nest(*tree_keys, &block) ⇒ Object

Return a nested hash using vector names as keys and an array constructed of hashes with other values.
#nrows ⇒ Object

The number of rows.
#numeric_vector_names ⇒ Object
#numeric_vectors ⇒ Object

Return the indexes of all the numeric vectors.
#one_to_many(parent_fields, pattern) ⇒ Object

Creates a new dataset for one to many relations on a dataset, based on pattern of field names.
#only_numerics(opts = {}) ⇒ Object

Return a DataFrame of only the numerical Vectors.
#pivot_table(opts = {}) ⇒ Object

Pivots a data frame on specified vectors and applies an aggregate function to quickly generate a summary.
#recast(opts = {}) ⇒ Object

Change dtypes of vectors by supplying a hash of :vector_name => :new_dtype.
#recode(axis = :vector, &block) ⇒ Object

Maps over the DataFrame and returns a DataFrame.
#recode_rows(&block) ⇒ Object
#recode_vectors(&block) ⇒ Object
#reindex(new_index) ⇒ Object

Change the index of the DataFrame and preserve the labels of the previous indexing.
#reindex_vectors(new_vectors) ⇒ Object
#rename(new_name) ⇒ Object

Rename the DataFrame.
#report_building(b) ⇒ Object

:nodoc: #.
#row ⇒ Object

Access a row or set/create a row.
#save(filename) ⇒ Object

Use marshalling to save dataframe to a file.
#set_index(new_index, opts = {}) ⇒ Object

Set a particular column as the new DF.
#shape ⇒ Object

Return the number of rows and columns of the DataFrame in an Array.
#sort(vector_order, opts = {}) ⇒ Object

Non-destructive version of #sort!.
#sort!(vector_order, opts = {}) ⇒ Object

Sorts a dataframe (ascending/descending)according to the given sequence of vectors, using the attributes provided in the blocks.
#summary(method = :to_text) ⇒ Object

Generate a summary of this DataFrame with ReportBuilder.
#tail(quantity = 10) ⇒ Object (also: #last)

The last ten elements of the DataFrame.
#to_a ⇒ Object

Converts the DataFrame into an array of hashes where key is vector name and value is the corresponding element.
#to_gsl ⇒ Object

Convert all numeric vectors to GSL::Matrix.
#to_hash ⇒ Object

Converts DataFrame to a hash with keys as vector names and values as the corresponding vectors.
#to_html(threshold = 30) ⇒ Object

Convert to html for IRuby.
#to_json(no_index = true) ⇒ Object

Convert to json.
#to_matrix ⇒ Object

Convert all vectors of type :numeric into a Matrix.
#to_nmatrix ⇒ Object

Convert all vectors of type :numeric and not containing nils into an NMatrix.
#to_nyaplotdf ⇒ Object

Return a Nyaplot::DataFrame from the data of this DataFrame.
#to_REXP ⇒ Object
#to_s ⇒ Object
#transpose ⇒ Object

Transpose a DataFrame, tranposing elements and row, column indexing.
#update ⇒ Object

Method for updating the metadata (i.e. missing value positions) of the after assingment/deletion etc.
#vector(*args) ⇒ Object
#vector_by_calculation(&block) ⇒ Object

DSL for yielding each row and returning a Daru::Vector based on the value each run of the block returns.
#vector_count_characters(vecs = nil) ⇒ Object
#vector_mean(max_missing = 0) ⇒ Object

Calculate mean of the rows of the dataframe.
#vector_sum(vecs = nil) ⇒ Object

Returns a vector with sum of all vectors specified in the argument.
#verify(*tests) ⇒ Object

Test each row with one or more tests.
#where(bool_array) ⇒ Object

Query a DataFrame by passing a Daru::Core::Query::BoolArray object.
#write_csv(filename, opts = {}) ⇒ Object

Write this DataFrame to a CSV file.
#write_excel(filename, opts = {}) ⇒ Object

Write this dataframe to an Excel Spreadsheet.
#write_sql(dbh, table) ⇒ Object

Insert each case of the Dataset on the selected table.

Constructor Details

#initialize(source, opts = {}) ⇒ `DataFrame`

DataFrame basically consists of an Array of Vector objects. These objects are indexed by row and column by vectors and index Index objects.

Arguments

source - Source from the DataFrame is to be initialized. Can be a Hash

of names and vectors (array or Daru::Vector), an array of arrays or array of Daru::Vectors.

Options

:order - An Array/Daru::Index/Daru::MultiIndex containing the order in which Vectors should appear in the DataFrame.

:index - An Array/Daru::Index/Daru::MultiIndex containing the order in which rows of the DataFrame will be named.

:name - A name for the DataFrame.

:clone - Specify as true or false. When set to false, and Vector objects are passed for the source, the Vector objects will not duplicated when creating the DataFrame. Will have no effect if Array is passed in the source, or if the passed Daru::Vectors have different indexes. Default to true.

Usage

df = Daru::DataFrame.new({a: [1,2,3,4], b: [6,7,8,9]}, order: [:b, :a],
  index: [:a, :b, :c, :d], name: :spider_man)

# =>
# <Daru::DataFrame:80766980 @name = spider_man @size = 4>
#             b          a
#  a          6          1
#  b          7          2
#  c          8          3
#  d          9          4

# File 'lib/daru/dataframe.rb', line 246

def initialize source, opts={}
  vectors = opts[:order]
  index   = opts[:index]
  clone   = opts[:clone] == false ? false : true
  @data   = []

  temp_name = opts[:name]
  @name   = temp_name || SecureRandom.uuid

  if source.empty?
    @vectors = try_create_index vectors
    @index   = try_create_index index
    create_empty_vectors
  else
    case source
    when Array
      if source.all? { |s| s.is_a?(Array) }
        raise ArgumentError, "Number of vectors (#{vectors.size}) should \
          equal order size (#{source.size})" if source.size != vectors.size

        @index   = try_create_index(index || source[0].size)
        @vectors = try_create_index(vectors)

        @vectors.each_with_index do |vec,idx|
          @data << Daru::Vector.new(source[idx], index: @index)
        end
      elsif source.all? { |s| s.is_a?(Daru::Vector) }
        hsh = {}
        vectors.each_with_index do |name, idx|
          hsh[name] = source[idx]
        end
        initialize(hsh, index: index, order: vectors, name: @name, clone: clone)
      else # array of hashes
        if vectors.nil?
          @vectors = Daru::Index.new source[0].keys
        else
          @vectors = Daru::Index.new(
            (vectors + (source[0].keys - vectors)).uniq)
        end
        @index = Daru::Index.new(index || source.size)

        @vectors.each do |name|
          v = []
          source.each do |hsh|
            v << (hsh[name] || hsh[name.to_s])
          end

          @data << Daru::Vector.new(v, name: set_name(name), index: @index)
        end
      end
    when Hash
      create_vectors_index_with vectors, source
      if all_daru_vectors_in_source? source
        if !index.nil?
          @index = try_create_index index
        elsif all_vectors_have_equal_indexes?(source)
          vectors_have_same_index = true
          @index = source.values[0].index.dup
        else
          all_indexes = []
          source.each_value do |vector|
            all_indexes << vector.index.to_a
          end
          # sort only if missing indexes detected
          all_indexes.flatten!.uniq!.sort!

          @index = Daru::Index.new all_indexes
          clone = true
        end

        if clone
          @vectors.each do |vector|
            # avoids matching indexes of vectors if all the supplied vectors
            # have the same index.
            if vectors_have_same_index
              v = source[vector].dup
            else
              v = Daru::Vector.new([], name: vector, index: @index)

              @index.each do |idx|
                if source[vector].index.include? idx
                  v[idx] = source[vector][idx]
                else
                  v[idx] = nil
                end
              end
            end
            @data << v
          end
        else
          @data.concat source.values
        end
      else
        @index = try_create_index(index || source.values[0].size)

        @vectors.each do |name|
          @data << Daru::Vector.new(source[name].dup, name: set_name(name), index: @index)
        end
      end
    end
  end

  set_size
  validate
  update
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(name, *args, &block) ⇒ `Object`

# File 'lib/daru/dataframe.rb', line 1984

def method_missing(name, *args, &block)
  if md = name.match(/(.+)\=/)
    insert_or_modify_vector name[/(.+)\=/].delete("=").to_sym, args[0]
  elsif self.has_vector? name
    self[name]
  else
    super(name, *args, &block)
  end
end

Instance Attribute Details

#index ⇒ `Object`

The index of the rows of the DataFrame



202
203
204

# File 'lib/daru/dataframe.rb', line 202

def index
  @index
end

#name ⇒ `Object` (readonly)

The name of the DataFrame



205
206
207

# File 'lib/daru/dataframe.rb', line 205

def name
  @name
end

#size ⇒ `Object` (readonly)

The number of rows present in the DataFrame



208
209
210

# File 'lib/daru/dataframe.rb', line 208

def size
  @size
end

#vectors ⇒ `Object`

The vectors (columns) index of the DataFrame



199
200
201

# File 'lib/daru/dataframe.rb', line 199

def vectors
  @vectors
end

Class Method Details

._load(data) ⇒ `Object`

# File 'lib/daru/dataframe.rb', line 1908

def self._load data
  h = Marshal.load data
  Daru::DataFrame.new(h[:data],
    index: h[:index],
    order: h[:order],
    name:  h[:name])
end

.crosstab_by_assignation(rows, columns, values) ⇒ `Object`

Generates a new dataset, using three vectors

Rows
Columns
Values

For example, you have these values

x   y   v
a   a   0
a   b   1
b   a   1
b   b   0

You obtain

id  a   b
 a  0   1
 b  1   0

Useful to process outputs from databases

# File 'lib/daru/dataframe.rb', line 164

def crosstab_by_assignation rows, columns, values
  raise "Three vectors should be equal size" if
    rows.size != columns.size or rows.size!=values.size

  cols_values = columns.factors
  cols_n      = cols_values.size

  h_rows = rows.factors.inject({}) do |a,v|
    a[v] = cols_values.inject({}) do |a1,v1|
      a1[v1]=nil
      a1
    end
    a
  end

  values.each_index do |i|
    h_rows[rows[i]][columns[i]] = values[i]
  end
  df = Daru::DataFrame.new({}, order: [:_id] + cols_values.to_a)

  rows.factors.each do |row|
    n_row = Array.new(cols_n+1)
    n_row[0] = row
    cols_values.each_index do |i|
      n_row[i+1] = h_rows[row][cols_values[i]]
    end

    df.add_row(n_row)
  end
  df.update
  df
end

.from_activerecord(relation, *fields) ⇒ `Object`

Read a dataframe from AR::Relation

USE:

# When Post model is defined as:
class Post < ActiveRecord::Base
  scope :active, -> { where.not(published_at: nil) }
end

# You can load active posts into a dataframe by:
Daru::DataFrame.from_activerecord(Post.active, :title, :published_at)

Parameters:

relation (ActiveRecord::Relation) —

An AR::Relation object from which data is loaded

Returns:

A dataframe containing the data loaded from the relation



95
96
97

# File 'lib/daru/dataframe.rb', line 95

def from_activerecord relation, *fields
  Daru::IO.from_activerecord relation, *fields
end

.from_csv(path, opts = {}, &block) ⇒ `Object`

Load data from a CSV file. Specify an optional block to grab the CSV object and pre-condition it (for example use the ‘convert` or `header_convert` methods).

Arguments

path - Path of the file to load specified as a String.

Options

Accepts the same options as the Daru::DataFrame constructor and CSV.open() and uses those to eventually construct the resulting DataFrame.

Verbose Description

You can specify all the options to the ‘.from_csv` function that you do to the Ruby `CSV.read()` function, since this is what is used internally.

For example, if the columns in your CSV file are separated by something other that commas, you can use the ‘:col_sep` option. If you want to convert numeric values to numbers and not keep them as strings, you can use the `:converters` option and set it to `:numeric`.

The ‘.from_csv` function uses the following defaults for reading CSV files (that are passed into the `CSV.read()` function):

{
  :col_sep           => ',',
  :converters        => :numeric
}



47
48
49

# File 'lib/daru/dataframe.rb', line 47

def from_csv path, opts={}, &block
  Daru::IO.from_csv path, opts, &block
end

.from_excel(path, opts = {}, &block) ⇒ `Object`

Read data from an Excel file into a DataFrame.

Arguments

path - Path of the file to be read.

Options

*:worksheet_id - ID of the worksheet that is to be read.



60
61
62

# File 'lib/daru/dataframe.rb', line 60

def from_excel path, opts={}, &block
  Daru::IO.from_excel path, opts, &block
end

.from_plaintext(path, fields) ⇒ `Object`

Read the database from a plaintext file. For this method to work, the data should be present in a plain text file in columns. See spec/fixtures/bank2.dat for an example.

Arguments

path - Path of the file to be read.
fields - Vector names of the resulting database.

Usage

df = Daru::DataFrame.from_plaintext 'spec/fixtures/bank2.dat', [:v1,:v2,:v3,:v4,:v5,:v6]



111
112
113

# File 'lib/daru/dataframe.rb', line 111

def from_plaintext path, fields
  Daru::IO.from_plaintext path, fields
end

.from_sql(dbh, query) ⇒ `Object`

Read a database query and returns a Dataset

USE:

dbh = DBI.connect("DBI:Mysql:database:localhost", "user", "password")
Daru::DataFrame.from_sql(dbh, "SELECT * FROM test")

Parameters:

dbh (DBI::DatabaseHandle) —

A DBI connection to be used to run the query
query (String) —

The query to be executed

Returns:

A dataframe containing the data resulting from the query



75
76
77

# File 'lib/daru/dataframe.rb', line 75

def from_sql dbh, query
  Daru::IO.from_sql dbh, query
end

.rows(source, opts = {}) ⇒ `Object`

Create DataFrame by specifying rows as an Array of Arrays or Array of Daru::Vector objects.

# File 'lib/daru/dataframe.rb', line 117

def rows source, opts={}
  df = nil
  if source.all? { |v| v.size == source[0].size }
    first = source[0]
    index = []
    opts[:order] ||=
    if first.is_a?(Daru::Vector) # assume that all are Vectors
      source.each { |vec| index << vec.name }
      first.index.to_a
    elsif first.is_a?(Array)
      Array.new(first.size) { |i| i.to_s }
    end

    if source.all? { |s| s.is_a?(Array) }
      df = Daru::DataFrame.new(source.transpose, opts)
    else # array of Daru::Vectors
      df = Daru::DataFrame.new({}, opts)
      source.each_with_index do |row, idx|
        df[(index[idx] || idx), :row] = row
      end
    end
  else
    raise SizeError, "All vectors must have same length"
  end

  df
end

Instance Method Details

#==(other) ⇒ `Object`

# File 'lib/daru/dataframe.rb', line 1976

def == other
  self.class == other.class   and
  @size      == other.size    and
  @index     == other.index   and
  @vectors   == other.vectors and
  @vectors.to_a.all? { |v| self[v] == other[v] }
end

#[](*names) ⇒ `Object`

Access row or vector. Specify name of row/vector followed by axis(:row, :vector). Defaults to :vector. Use of this method is not recommended for accessing rows or vectors. Use df.row for accessing row with index ‘:a’ or df.vector for accessing vector with index :vec.

# File 'lib/daru/dataframe.rb', line 362

def [](*names)
  if names[-1] == :vector or names[-1] == :row
    axis = names[-1]
    names = names[0..-2]
  else
    axis = :vector
  end

  if axis == :vector
    access_vector *names
  elsif axis == :row
    access_row *names
  else
    raise IndexError, "Expected axis to be row or vector not #{axis}"
  end
end

#[]=(*args) ⇒ `Object`

Insert a new row/vector of the specified name or modify a previous row. Instead of using this method directly, use df.row = [1,2,3] to set/create a row ‘:a’ to [1,2,3], or df.vector = [1,2,3] for vectors.

In case a Daru::Vector is specified after the equality the sign, the indexes of the vector will be matched against the row/vector indexes of the DataFrame before an insertion is performed. Unmatched indexes will be set to nil.

# File 'lib/daru/dataframe.rb', line 386

def []=(*args)
  axis = args.include?(:row) ? :row : :vector
  args.delete :vector
  args.delete :row

  name = args[0..-2]
  vector = args[-1]

  if axis == :vector
    insert_or_modify_vector name, vector
  elsif axis == :row
    insert_or_modify_row name, vector
  else
    raise IndexError, "Expected axis to be row or vector, not #{axis}."
  end
end

#_dump(depth) ⇒ `Object`

# File 'lib/daru/dataframe.rb', line 1899

def _dump depth
  Marshal.dump({
    data:  @data,
    index: @index.to_a,
    order: @vectors.to_a,
    name:  @name
    })
end

#add_row(row, index = nil) ⇒ `Object`



408
409
410

# File 'lib/daru/dataframe.rb', line 408

def add_row row, index=nil
  self.row[index || @size] = row
end

#add_vector(n, vector) ⇒ `Object`



412
413
414

# File 'lib/daru/dataframe.rb', line 412

def add_vector n, vector
  self[n] = vector
end

#add_vectors_by_split(name, join = '-', sep = Daru::SPLIT_TOKEN) ⇒ `Object`

# File 'lib/daru/dataframe.rb', line 1097

def add_vectors_by_split(name,join='-',sep=Daru::SPLIT_TOKEN)
  split = self[name].split_by_separator(sep)
  split.each { |k,v| self[(name.to_s + join + k.to_s).to_sym] = v }
end

#add_vectors_by_split_recode(name_, join = '-', sep = Daru::SPLIT_TOKEN) ⇒ `Object`

# File 'lib/daru/dataframe.rb', line 1683

def add_vectors_by_split_recode(name_, join='-', sep=Daru::SPLIT_TOKEN)
  split = self[name_].split_by_separator(sep)
  i = 1
  split.each { |k,v|
    new_field = name_.to_s + join + i.to_s
    v.rename name_.to_s + ":" + k.to_s
    self[new_field.to_sym] = v
    i += 1
  }
end

#all?(axis = :vector, &block) ⇒ `Boolean`

Works like Array#all?

Examples:

Using all?

df = Daru::DataFrame.new({a: [1,2,3,4,5], b: ['a', 'b', 'c', 'd', 'e']})
df.all?(:row) do |row|
  row[:a] < 10
end #=> true

Parameters:

axis (Symbol) (defaults to: :vector) —

(:vector) The axis to iterate over. Can be :vector or :row. A Daru::Vector object is yielded in the block.

Returns:

(Boolean)

# File 'lib/daru/dataframe.rb', line 1153

def all? axis=:vector, &block
  if axis == :vector or axis == :column
    @data.all?(&block)
  elsif axis == :row
    each_row do |row|
      return false unless yield(row)
    end
    return true
  else
    raise ArgumentError, "Unidentified axis #{axis}"
  end
end

#any?(axis = :vector, &block) ⇒ `Boolean`

Works like Array#any?.

Examples:

Using any?

df = Daru::DataFrame.new({a: [1,2,3,4,5], b: ['a', 'b', 'c', 'd', 'e']})
df.any?(:row) do |row|
  row[:a] < 3 and row[:b] == 'b'
end #=> true

Parameters:

axis (Symbol) (defaults to: :vector) —

(:vector) The axis to iterate over. Can be :vector or :row. A Daru::Vector object is yielded in the block.

Returns:

(Boolean)

# File 'lib/daru/dataframe.rb', line 1131

def any? axis=:vector, &block
  if axis == :vector or axis == :column
    @data.any?(&block)
  elsif axis == :row
    each_row do |row|
      return true if yield(row)
    end
    return false
  else
    raise ArgumentError, "Unidentified axis #{axis}"
  end
end

#bootstrap(n = nil) ⇒ `Daru::DataFrame`

Creates a DataFrame with the random data, of n size. If n not given, uses original number of rows.

Returns:

(Daru::DataFrame)

# File 'lib/daru/dataframe.rb', line 888

def bootstrap(n=nil)
  n ||= nrows
  ds_boot = Daru::DataFrame.new({}, order: @vectors)
  n.times do
    ds_boot.add_row(row[rand(n)])
  end
  ds_boot.update
  ds_boot
end

#clone(*vectors_to_clone) ⇒ `Object`

Returns a ‘view’ of the DataFrame, i.e the object ID’s of vectors are preserved.

Arguments

vectors_to_clone - Names of vectors to clone. Optional. Will return a view of the whole data frame otherwise.

# File 'lib/daru/dataframe.rb', line 455

def clone *vectors_to_clone
  vectors_to_clone.flatten! unless vectors_to_clone.all? { |a| !a.is_a?(Array) }
  return super if vectors_to_clone.empty?

  h = vectors_to_clone.inject({}) do |hsh, vec|
    hsh[vec] = self[vec]
    hsh
  end
  Daru::DataFrame.new(h, clone: false)
end

#clone_only_valid ⇒ `Object`

Returns a ‘shallow’ copy of DataFrame if missing data is not present, or a full copy of only valid data if missing data is present.

# File 'lib/daru/dataframe.rb', line 468

def clone_only_valid
  if has_missing_data?
    dup_only_valid
  else
    clone
  end
end

#clone_structure ⇒ `Object`

Only clone the structure of the DataFrame.



444
445
446

# File 'lib/daru/dataframe.rb', line 444

def clone_structure
  Daru::DataFrame.new([], order: @vectors.dup, index: @index.dup, name: @name)
end

#collect(axis = :vector, &block) ⇒ `Object`

Iterate over a row or vector and return results in a Daru::Vector. Specify axis with :vector or :row. Default to :vector.

Description

The #collect iterator works similar to #map, the only difference being that it returns a Daru::Vector comprising of the results of each block run. The resultant Vector has the same index as that of the axis over which collect has iterated. It also accepts the optional axis argument.

Arguments

axis - The axis to iterate over. Can be :vector (or :column)

or :row. Default to :vector.

# File 'lib/daru/dataframe.rb', line 579

def collect axis=:vector, &block
  if axis == :vector or axis == :column
    collect_vectors(&block)
  elsif axis == :row
    collect_rows(&block)
  else
    raise ArgumentError, "Unknown axis #{axis}"
  end
end

#collect_matrix ⇒ `::Matrix`

Generate a matrix, based on vector names of the DataFrame.

Returns:

(::Matrix)

# File 'lib/daru/dataframe.rb', line 842

def collect_matrix
  return to_enum(:collect_matrix) unless block_given?

  vecs = vectors.to_a
  rows = vecs.collect { |row|
    vecs.collect { |col|
      yield row,col
    }
  }

  Matrix.rows(rows)
end

#collect_row_with_index(&block) ⇒ `Object`

# File 'lib/daru/dataframe.rb', line 804

def collect_row_with_index &block
  return to_enum(:collect_row_with_index) unless block_given?

  data = []
  each_row_with_index do |row, i|
    data.push yield(row, i)
  end

  Daru::Vector.new(data, index: @index)
end

#collect_rows(&block) ⇒ `Object`

Retrieves a Daru::Vector, based on the result of calculation performed on each row.

# File 'lib/daru/dataframe.rb', line 793

def collect_rows &block
  return to_enum(:collect_rows) unless block_given?

  data = []
  each_row do |row|
    data.push yield(row)
  end

  Daru::Vector.new(data, index: @index)
end

#collect_vector_with_index(&block) ⇒ `Object`

# File 'lib/daru/dataframe.rb', line 828

def collect_vector_with_index &block
  return to_enum(:collect_vector_with_index) unless block_given?

  data = []
  each_vector_with_index do |vec, i|
    data.push yield(vec, i)
  end

  Daru::Vector.new(data, index: @vectors)
end

#collect_vectors(&block) ⇒ `Object`

Retrives a Daru::Vector, based on the result of calculation performed on each vector.

# File 'lib/daru/dataframe.rb', line 817

def collect_vectors &block
  return to_enum(:collect_vectors) unless block_given?

  data = []
  each_vector do |vec|
    data.push yield(vec)
  end

  Daru::Vector.new(data, index: @vectors)
end

#column(name) ⇒ `Object`

Access a vector by name.



404
405
406

# File 'lib/daru/dataframe.rb', line 404

def column name
  vector[name]
end

#compute(text, &block) ⇒ `Object`

Returns a vector, based on a string with a calculation based on vector.

The calculation will be eval’ed, so you can put any variable or expression valid on ruby.

For example:

a = Daru::Vector.new [1,2]
b = Daru::Vector.new [3,4]
ds = Daru::DataFrame.new({:a => a,:b => b})
ds.compute("a+b")
=> Vector [4,6]

# File 'lib/daru/dataframe.rb', line 1029

def compute text, &block
  return instance_eval(&block) if block_given?
  instance_eval(text)
end

#concat(other_df) ⇒ `Object`

Concatenate another DataFrame along corresponding columns. Very premature implementation. Use with caution.

# File 'lib/daru/dataframe.rb', line 1263

def concat other_df
  vectors = []
  @vectors.each do |v|
    vectors << self[v].to_a.dup.concat(other_df[v].to_a)
  end

  Daru::DataFrame.new(vectors, order: @vectors)
end

#create_sql(table, charset = "UTF8") ⇒ `Object`

Create a sql, basen on a given Dataset

Arguments

table - String specifying name of the table that will created in SQL.
charset - Character set. Default is “UTF8”.

Examples:


ds = Daru::DataFrame.new({
 :id   => Daru::Vector.new([1,2,3,4,5]),
 :name => Daru::Vector.new(%w{Alex Peter Susan Mary John})
})
ds.create_sql('names')
 #=>"CREATE TABLE names (id INTEGER,\n name VARCHAR (255)) CHARACTER SET=UTF8;"

# File 'lib/daru/dataframe.rb', line 1710

def create_sql(table,charset="UTF8")
  sql    = "CREATE TABLE #{table} ("
  fields = self.vectors.to_a.collect do |f|
    v = self[f]
    f.to_s + " " + v.db_type
  end

  sql + fields.join(",\n ")+") CHARACTER SET=#{charset};"
end

#delete_row(index) ⇒ `Object`

Delete a row

# File 'lib/daru/dataframe.rb', line 869

def delete_row index
  idx = named_index_for index

  if @index.include? idx
    @index = Daru::Index.new(@index.to_a - [idx])
    self.each_vector do |vector|
      vector.delete_at idx
    end
  else
    raise IndexError, "Index #{index} does not exist."
  end

  set_size
end

#delete_vector(vector) ⇒ `Object`

Delete a vector

# File 'lib/daru/dataframe.rb', line 857

def delete_vector vector
  if @vectors.include? vector
    @data.delete_at @vectors[vector]
    @vectors = Daru::Index.new @vectors.to_a - [vector]
  else
    raise IndexError, "Vector #{vector} does not exist."
  end

  self
end

#dup(vectors_to_dup = nil) ⇒ `Object`

Duplicate the DataFrame entirely.

Arguments

vectors_to_dup - An Array specifying the names of Vectors to

be duplicated. Will duplicate the entire DataFrame if not specified.

# File 'lib/daru/dataframe.rb', line 431

def dup vectors_to_dup=nil
  vectors_to_dup = @vectors.to_a unless vectors_to_dup

  src = []
  vectors_to_dup.each do |vec|
    src << @data[@vectors[vec]].to_a.dup
  end
  new_order = Daru::Index.new(vectors_to_dup)

  Daru::DataFrame.new src, order: new_order, index: @index.dup, name: @name, clone: true
end

#dup_only_valid(vecs = nil) ⇒ `Object`

Creates a new duplicate dataframe containing only rows without a single missing value.

# File 'lib/daru/dataframe.rb', line 478

def dup_only_valid vecs=nil
  rows_with_nil = @data.inject([]) do |memo, vector|
    memo.concat vector.missing_positions
    memo
  end.uniq

  row_indexes = @index.to_a
  (vecs.nil? ? self : dup(vecs)).row[*(row_indexes - rows_with_nil)]
end

#each(axis = :vector, &block) ⇒ `Object`

Iterate over each row or vector of the DataFrame. Specify axis by passing :vector or :row as the argument. Default to :vector.

Description

‘#each` works exactly like Array#each. The default mode for `each` is to iterate over the columns of the DataFrame. To iterate over rows you must pass the axis, i.e `:row` as an argument.

Arguments

axis - The axis to iterate over. Can be :vector (or :column)

or :row. Default to :vector.

# File 'lib/daru/dataframe.rb', line 554

def each axis=:vector, &block
  if axis == :vector or axis == :column
    each_vector(&block)
  elsif axis == :row
    each_row(&block)
  else
    raise ArgumentError, "Unknown axis #{axis}"
  end
end

#each_index(&block) ⇒ `Object`

Iterate over each index of the DataFrame.

# File 'lib/daru/dataframe.rb', line 489

def each_index &block
  return to_enum(:each_index) unless block_given?

  @index.each(&block)
  self
end

#each_row(&block) ⇒ `Object`

Iterate over each row

# File 'lib/daru/dataframe.rb', line 521

def each_row(&block)
  return to_enum(:each_row) unless block_given?

  @index.each do |index|
    yield access_row(index)
  end

  self
end

#each_row_with_index(&block) ⇒ `Object`

# File 'lib/daru/dataframe.rb', line 531

def each_row_with_index(&block)
  return to_enum(:each_row_with_index) unless block_given?

  @index.each do |index|
    yield access_row(index), index
  end

  self
end

#each_vector(&block) ⇒ `Object` Also known as: each_column

Iterate over each vector

# File 'lib/daru/dataframe.rb', line 497

def each_vector(&block)
  return to_enum(:each_vector) unless block_given?

  @data.each(&block)

  self
end

#each_vector_with_index(&block) ⇒ `Object` Also known as: each_column_with_index

Iterate over each vector alongwith the name of the vector

# File 'lib/daru/dataframe.rb', line 508

def each_vector_with_index(&block)
  return to_enum(:each_vector_with_index) unless block_given?

  @vectors.each do |vector|
    yield @data[@vectors[vector]], vector
  end

  self
end

#filter(axis = :vector, &block) ⇒ `Object`

Retain vectors or rows if the block returns a truthy value.

Description

For filtering out certain rows/vectors based on their values, use the #filter method. By default it iterates over vectors and keeps those vectors for which the block returns true. It accepts an optional axis argument which lets you specify whether you want to iterate over vectors or rows.

Arguments

axis - The axis to map over. Can be :vector (or :column) or :row.

Default to :vector.

Usage

# Filter vectors

df.filter do |vector|
  vector.type == :numeric and vector.median < 50
end

# Filter rows

df.filter(:row) do |row|
  row[:a] + row[:d] < 100
end

# File 'lib/daru/dataframe.rb', line 684

def filter axis=:vector, &block
  if axis == :vector or axis == :column
    filter_vectors(&block)
  elsif axis == :row
    filter_rows(&block)
  end
end

#filter_rows(&block) ⇒ `Object`

Iterates over each row and retains it in a new DataFrame if the block returns true for that row.

# File 'lib/daru/dataframe.rb', line 931

def filter_rows &block
  return to_enum(:filter_rows) unless block_given?

  df = Daru::DataFrame.new({}, order: @vectors.to_a)
  marked = []

  @index.each do |index|
    keep_row = yield access_row(index)
    marked << index if keep_row
  end

  marked.each do |idx|
    df.row[idx] = self[idx, :row]
  end

  df
end

#filter_vector(vec) ⇒ `Object`

creates a new vector with the data of a given field which the block returns true

# File 'lib/daru/dataframe.rb', line 920

def filter_vector vec
  d = []
  each_row do |row|
    d.push(row[vec]) if yield row
  end

  Daru::Vector.new(d)
end

#filter_vectors(&block) ⇒ `Object`

Iterates over each vector and retains it in a new DataFrame if the block returns true for that vector.

# File 'lib/daru/dataframe.rb', line 951

def filter_vectors &block
  return to_enum(:filter_vectors) unless block_given?

  df = self.dup
  df.keep_vector_if &block

  df
end

#group_by(*vectors) ⇒ `Object`

Group elements by vector to perform operations on them. Returns a Daru::Core::GroupBy object.See the Daru::Core::GroupBy docs for a detailed list of possible operations.

Arguments

vectors - An Array contatining names of vectors to group by.

Usage

df = Daru::DataFrame.new({
  a: %w{foo bar foo bar   foo bar foo foo},
  b: %w{one one two three two two one three},
  c:   [1  ,2  ,3  ,1    ,3  ,6  ,3  ,8],
  d:   [11 ,22 ,33 ,44   ,55 ,66 ,77 ,88]
})
df.group_by([:a,:b,:c]).groups
#=> {["bar", "one", 2]=>[1],
# ["bar", "three", 1]=>[3],
# ["bar", "two", 6]=>[5],
# ["foo", "one", 1]=>[0],
# ["foo", "one", 3]=>[6],
# ["foo", "three", 8]=>[7],
# ["foo", "two", 3]=>[2, 4]}

# File 'lib/daru/dataframe.rb', line 1237

def group_by *vectors
  vectors.flatten!
  vectors.each { |v| raise(ArgumentError, "Vector #{v} does not exist") unless
    has_vector?(v) }

  Daru::Core::GroupBy.new(self, vectors)
end

#has_missing_data? ⇒ `Boolean` Also known as: flawed?

Returns:

(Boolean)



1053
1054
1055

# File 'lib/daru/dataframe.rb', line 1053

def has_missing_data?
  !!@data.any? { |v| v.has_missing_data? }
end

#has_vector?(vector) ⇒ `Boolean`

Check if a vector is present

Returns:

(Boolean)



1118
1119
1120

# File 'lib/daru/dataframe.rb', line 1118

def has_vector? vector
  @vectors.include? vector
end

#head(quantity = 10) ⇒ `Object` Also known as: first

The first ten elements of the DataFrame

Parameters:

quantity (Fixnum) (defaults to: 10) —

(10) The number of elements to display from the top.



1169
1170
1171

# File 'lib/daru/dataframe.rb', line 1169

def head quantity=10
  self[0..(quantity-1), :row]
end

#inspect(spacing = 10, threshold = 15) ⇒ `Object`

Pretty print in a nice table format for the command line (irb/pry/iruby)

# File 'lib/daru/dataframe.rb', line 1938

def inspect spacing=10, threshold=15
  longest = [@name.to_s.size,
             (@vectors.map(&:to_s).map(&:size).max || 0),
             (@index  .map(&:to_s).map(&:size).max || 0),
             (@data   .map{ |v| v.map(&:to_s).map(&:size).max}.max || 0)].max

  name      = @name || 'nil'
  content   = ""
  longest   = spacing if longest > spacing
  formatter = "\n"

  (@vectors.size + 1).times { formatter += "%#{longest}.#{longest}s " }
  content += "\n#<" + self.class.to_s + ":" + self.object_id.to_s + " @name = " +
                name.to_s + " @size = " + @size.to_s + ">"
  content += sprintf formatter, "" , *@vectors.map(&:to_s)
  row_num  = 1

  self.each_row_with_index do |row, index|
    content += sprintf formatter, index.to_s, *row.to_hash.values.map { |e| (e || 'nil').to_s }
    row_num += 1
    if row_num > threshold
      dots = []

      (@vectors.size + 1).times { dots << "..." }
      content += sprintf formatter, *dots
      break
    end
  end
  content += "\n"

  content
end

#join(other_df, opts = {}) ⇒ `Daru::DataFrame`

Join 2 DataFrames with SQL style joins. Currently supports inner, left outer, right outer and full outer joins.

Examples:

Inner Join

left = Daru::DataFrame.new({
  :id   => [1,2,3,4],
  :name => ['Pirate', 'Monkey', 'Ninja', 'Spaghetti']
})
right = Daru::DataFrame.new({
  :id => [1,2,3,4],
  :name => ['Rutabaga', 'Pirate', 'Darth Vader', 'Ninja']
})
left.join(right, how: :inner, on: [:name])
#=>
##<Daru::DataFrame:82416700 @name = 74c0811b-76c6-4c42-ac93-e6458e82afb0 @size = 2>
#                 id_1       name       id_2
#         0          1     Pirate          2
#         1          3      Ninja          4

Parameters:

other_df (Daru::DataFrame) —

Another DataFrame on which the join is to be performed.
opts (Hash) (defaults to: {}) —

Options Hash
:how (Hash) —

a customizable set of options
:on (Hash) —

a customizable set of options

Returns:

(Daru::DataFrame)



1602
1603
1604

# File 'lib/daru/dataframe.rb', line 1602

def join(other_df,opts={})
  Daru::Core::Merge.join(self, other_df, opts)
end

#keep_row_if(&block) ⇒ `Object`

# File 'lib/daru/dataframe.rb', line 898

def keep_row_if &block
  deletion = []

  @index.each do |index|
    keep_row = yield access_row(index)

    deletion << index unless keep_row
  end
  deletion.each { |idx|
    delete_row idx
  }
end

#keep_vector_if(&block) ⇒ `Object`

# File 'lib/daru/dataframe.rb', line 911

def keep_vector_if &block
  @vectors.each do |vector|
    keep_vector = yield @data[@vectors[vector]], vector

    delete_vector vector unless keep_vector
  end
end

#map(axis = :vector, &block) ⇒ `Object`

Map over each vector or row of the data frame according to the argument specified. Will return an Array of the resulting elements. To map over each row/vector and get a DataFrame, see #recode.

Description

The #map iterator works like Array#map. The value returned by each run of the block is added to an Array and the Array is returned. This method also accepts an axis argument, like #each. The default is :vector.

Arguments

axis - The axis to map over. Can be :vector (or :column) or :row.

Default to :vector.

# File 'lib/daru/dataframe.rb', line 605

def map axis=:vector, &block
  if axis == :vector or axis == :column
    map_vectors(&block)
  elsif axis == :row
    map_rows(&block)
  else
    raise ArgumentError, "Unknown axis #{axis}"
  end
end

#map!(axis = :vector, &block) ⇒ `Object`

Destructive map. Modifies the DataFrame. Each run of the block must return a Daru::Vector. You can specify the axis to map over as the argument. Default to :vector.

Arguments

axis - The axis to map over. Can be :vector (or :column) or :row.

Default to :vector.

# File 'lib/daru/dataframe.rb', line 623

def map! axis=:vector, &block
  if axis == :vector or axis == :column
    map_vectors!(&block)
  elsif axis == :row
    map_rows!(&block)
  end
end

#map_rows(&block) ⇒ `Object`

Map each row

# File 'lib/daru/dataframe.rb', line 757

def map_rows(&block)
  return to_enum(:map_rows) unless block_given?

  dt = []
  each_row do |row|
    dt << yield(row)
  end

  dt
end

#map_rows!(&block) ⇒ `Object`

# File 'lib/daru/dataframe.rb', line 779

def map_rows!(&block)
  return to_enum(:map_rows!) unless block_given?

  index.dup.each do |i|
    r = yield self.row[i]
    r.is_a?(Daru::Vector) or raise TypeError, "Returned object must be Daru::Vector not #{r.class}"
    self.row[i] = r
  end

  self
end

#map_rows_with_index(&block) ⇒ `Object`

# File 'lib/daru/dataframe.rb', line 768

def map_rows_with_index(&block)
  return to_enum(:map_rows_with_index) unless block_given?

  dt = []
  each_row_with_index do |row, index|
    dt << yield(row, index)
  end

  dt
end

#map_vectors(&block) ⇒ `Object`

Map each vector and return an Array.

# File 'lib/daru/dataframe.rb', line 720

def map_vectors(&block)
  return to_enum(:map_vectors) unless block_given?

  arry = []
  @data.each do |vec|
    arry << yield(vec)
  end

  arry
end

#map_vectors!(&block) ⇒ `Object`

Destructive form of #map_vectors

# File 'lib/daru/dataframe.rb', line 732

def map_vectors!(&block)
  return to_enum(:map_vectors!) unless block_given?

  vectors.dup.each do |n|
    v = yield self[n]
    v.is_a?(Daru::Vector) or raise TypeError, "Must return a Daru::Vector not #{v.class}"
    self[n] = v
  end

  self
end

#map_vectors_with_index(&block) ⇒ `Object`

Map vectors alongwith the index.

# File 'lib/daru/dataframe.rb', line 745

def map_vectors_with_index(&block)
  return to_enum(:map_vectors_with_index) unless block_given?

  dt = []
  each_vector_with_index do |vector, name|
    dt << yield(vector, name)
  end

  dt
end

#merge(other_df) ⇒ `Daru::DataFrame`

Merge vectors from two DataFrames. In case of name collision, the vectors names are changed to x_1, x_2 .…

Returns:

(Daru::DataFrame)

# File 'lib/daru/dataframe.rb', line 1560

def merge other_df
  raise "Number of rows must be equal in this: #{nrows} and other: #{other_df.nrows}" unless nrows == other_df.nrows

  new_fields = (@vectors.to_a + other_df.vectors.to_a)
                    .recode_repeated
                    .map(&:to_sym)
  df_new     = DataFrame.new({}, order: new_fields)

  (0...nrows).to_a.each do |i|
    row = self.row[i].to_a + other_df.row[i].to_a
    df_new.add_row(row)
  end

  df_new.update
  df_new
end

#missing_values_rows(missing_values = [nil]) ⇒ `Object` Also known as: vector_missing_values

Return a vector with the number of missing values in each row.

Arguments

missing_values - An Array of the values that should be

treated as ‘missing’. The default missing value is nil.

# File 'lib/daru/dataframe.rb', line 1040

def missing_values_rows missing_values=[nil]
  number_of_missing = []
  each_row do |row|
    row.missing_values = missing_values
    number_of_missing << row.missing_positions.size
  end

  Daru::Vector.new number_of_missing, index: @index, name: "#{@name}_missing_rows"
end

#ncols ⇒ `Object`

The number of vectors



1113
1114
1115

# File 'lib/daru/dataframe.rb', line 1113

def ncols
  shape[1]
end

#nest(*tree_keys, &block) ⇒ `Object`

Return a nested hash using vector names as keys and an array constructed of hashes with other values. If block provided, is used to provide the values, with parameters row of dataset, current last hash on hierarchy and name of the key to include

# File 'lib/daru/dataframe.rb', line 1063

def nest *tree_keys, &block
  tree_keys = tree_keys[0] if tree_keys[0].is_a? Array
  out = {}

  each_row do |row|
    current = out
    # Create tree
    tree_keys[0, tree_keys.size-1].each do |f|
      root = row[f]
      current[root] ||= {}
      current = current[root]
    end
    name = row[tree_keys.last]
    if !block
      current[name] ||= []
      current[name].push(row.to_hash.delete_if { |key,value| tree_keys.include? key})
    else
      current[name] = block.call(row, current,name)
    end
  end

  out
end

#nrows ⇒ `Object`

The number of rows



1108
1109
1110

# File 'lib/daru/dataframe.rb', line 1108

def nrows
  shape[0]
end

#numeric_vector_names ⇒ `Object`

# File 'lib/daru/dataframe.rb', line 1370

def numeric_vector_names
  numerics = []

  @vectors.each do |v|
    numerics << v if (self[v].type == :numeric)
  end
  numerics
end

#numeric_vectors ⇒ `Object`

Return the indexes of all the numeric vectors. Will include vectors with nils alongwith numbers.

# File 'lib/daru/dataframe.rb', line 1361

def numeric_vectors
  numerics = []

  each_vector_with_index do |vec, i|
    numerics << i if(vec.type == :numeric)
  end
  numerics
end

#one_to_many(parent_fields, pattern) ⇒ `Object`

Creates a new dataset for one to many relations on a dataset, based on pattern of field names.

for example, you have a survey for number of children with this structure:

id, name, child_name_1, child_age_1, child_name_2, child_age_2

with

ds.one_to_many([:id], "child_%v_%n"

the field of first parameters will be copied verbatim to new dataset, and fields which responds to second pattern will be added one case for each different %n.

Examples:

cases=[
  ['1','george','red',10,'blue',20,nil,nil],
  ['2','fred','green',15,'orange',30,'white',20],
  ['3','alfred',nil,nil,nil,nil,nil,nil]
]
ds=Daru::DataFrame.rows(cases, order: [:id, :name, :car_color1, :car_value1, :car_color2, :car_value2, :car_color3, :car_value3])
ds.one_to_many([:id],'car_%v%n').to_matrix
#=> Matrix[
#   ["red", "1", 10],
#   ["blue", "1", 20],
#   ["green", "2", 15],
#   ["orange", "2", 30],
#   ["white", "2", 20]
#   ]

# File 'lib/daru/dataframe.rb', line 1634

def one_to_many(parent_fields, pattern)
  re      = Regexp.new pattern.gsub("%v","(.+?)").gsub("%n","(\\d+?)")
  ds_vars = parent_fields.dup
  vars    = []
  max_n   = 0
  h       = parent_fields.inject({}) { |a,v|
    a[v] = Daru::Vector.new([])
    a
  }
  # Adding _row_id
  h['_col_id'] = Daru::Vector.new([])
  ds_vars.push('_col_id')

  @vectors.each do |f|
    if f =~ re
      if !vars.include? $1
        vars.push($1)
        h[$1] = Daru::Vector.new([])
      end
      max_n = $2.to_i if max_n < $2.to_i
    end
  end
  ds = DataFrame.new(h, order: ds_vars+vars)

  each_row do |row|
    row_out = {}
    parent_fields.each do |f|
      row_out[f] = row[f]
    end

    max_n.times do |n1|
      n  = n1+1
      any_data = false
      vars.each do |v|
        data = row[pattern.gsub("%v",v.to_s).gsub("%n",n.to_s)]
        row_out[v] = data
        any_data = true if !data.nil?
      end

      if any_data
        row_out['_col_id'] = n
        ds.add_row(row_out)
      end
    end
  end
  ds.update
  ds
end

#only_numerics(opts = {}) ⇒ `Object`

Return a DataFrame of only the numerical Vectors. If clone: false is specified as option, only a view of the Vectors will be returned. Defaults to clone: true.

# File 'lib/daru/dataframe.rb', line 1382

def only_numerics opts={}
  cln = opts[:clone] == false ? false : true
  nv = numeric_vectors
  arry = nv.inject([]) do |arr, v|
    arr << self[v]
    arr
  end

  order = Index.new(nv)
  Daru::DataFrame.new(arry, clone: cln, order: order, index: @index)
end

#pivot_table(opts = {}) ⇒ `Object`

Pivots a data frame on specified vectors and applies an aggregate function to quickly generate a summary.

Options

:index - Keys to group by on the pivot table row index. Pass vector names contained in an Array.

:vectors - Keys to group by on the pivot table column index. Pass vector names contained in an Array.

:agg - Function to aggregate the grouped values. Default to :mean. Can use any of the statistics functions applicable on Vectors that can be found in the Daru::Statistics::Vector module.

:values - Columns to aggregate. Will consider all numeric columns not specified in :index or :vectors. Optional.

Usage

df = Daru::DataFrame.new({
  a: ['foo'  ,  'foo',  'foo',  'foo',  'foo',  'bar',  'bar',  'bar',  'bar'],
  b: ['one'  ,  'one',  'one',  'two',  'two',  'one',  'one',  'two',  'two'],
  c: ['small','large','large','small','small','large','small','large','small'],
  d: [1,2,2,3,3,4,5,6,7],
  e: [2,4,4,6,6,8,10,12,14]
})
df.pivot_table(index: [:a], vectors: [:b], agg: :sum, values: :e)

#=>
# #<Daru::DataFrame:88342020 @name = 08cdaf4e-b154-4186-9084-e76dd191b2c9 @size = 2>
#            [:e, :one] [:e, :two]
#     [:bar]         18         26
#     [:foo]         10         12

Raises:

(ArgumentError)

# File 'lib/daru/dataframe.rb', line 1490

def pivot_table opts={}
  raise ArgumentError,
    "Specify grouping index" if !opts[:index] or opts[:index].empty?

  index   = opts[:index]
  vectors = opts[:vectors] || []
  aggregate_function = opts[:agg] || :mean
  values =
  if opts[:values].is_a?(Symbol)
    [opts[:values]]
  elsif opts[:values].is_a?(Array)
    opts[:values]
  else # nil
    (@vectors.to_a - (index | vectors)) & numeric_vector_names
  end

  raise IndexError, "No numeric vectors to aggregate" if values.empty?

  grouped  = group_by(index)

  unless vectors.empty?
    super_hash = {}
    values.each do |value|
      grouped.groups.each do |group_name, row_numbers|
        super_hash[group_name] ||= {}

        row_numbers.each do |num|
          arry = []
          arry << value
          vectors.each { |v| arry << self[v][num] }
          sub_hash = super_hash[group_name]
          sub_hash[arry] ||= []

          sub_hash[arry] << self[value][num]
        end
      end
    end

    super_hash.each_value do |sub_hash|
      sub_hash.each do |group_name, aggregates|
        sub_hash[group_name] = Daru::Vector.new(aggregates).send(aggregate_function)
      end
    end

    df_index = Daru::MultiIndex.from_tuples super_hash.keys

    vector_indexes = []
    super_hash.each_value do |sub_hash|
      vector_indexes.concat sub_hash.keys
    end

    df_vectors = Daru::MultiIndex.from_tuples vector_indexes.uniq
    pivoted_dataframe = Daru::DataFrame.new({}, index: df_index, order: df_vectors)

    super_hash.each do |row_index, sub_h|
      sub_h.each do |vector_index, val|
        # pivoted_dataframe[symbolize(vector_index)][symbolize(row_index)] = val
        pivoted_dataframe[vector_index][row_index] = val
      end
    end
    return pivoted_dataframe
  else
    grouped.send(aggregate_function)
  end
end

#recast(opts = {}) ⇒ `Object`

Change dtypes of vectors by supplying a hash of :vector_name => :new_dtype

Usage

df = Daru::DataFrame.new({a: [1,2,3], b: [1,2,3], c: [1,2,3]})
df.recast a: :nmatrix, c: :nmatrix

# File 'lib/daru/dataframe.rb', line 1921

def recast opts={}
  opts.each do |vector_name, dtype|
    self[vector_name].cast(dtype: dtype)
  end
end

#recode(axis = :vector, &block) ⇒ `Object`

Maps over the DataFrame and returns a DataFrame. Each run of the block must return a Daru::Vector object. You can specify the axis to map over. Default to :vector.

Description

Recode works similarly to #map, but an important difference between the two is that recode returns a modified Daru::DataFrame instead of an Array. For this reason, #recode expects that every run of the block to return a Daru::Vector.

Just like map and each, recode also accepts an optional axis argument.

Arguments

axis - The axis to map over. Can be :vector (or :column) or :row.

Default to :vector.

# File 'lib/daru/dataframe.rb', line 648

def recode axis=:vector, &block
  if axis == :vector or axis == :column
    recode_vectors(&block)
  elsif axis == :row
    recode_rows(&block)
  end
end

#recode_rows(&block) ⇒ `Object`

# File 'lib/daru/dataframe.rb', line 706

def recode_rows &block
  block_given? or return to_enum(:recode_rows)

  df = self.dup
  df.each_row_with_index do |r, i|
    ret = yield r
    ret.is_a?(Daru::Vector) or raise TypeError, "Every iteration must return Daru::Vector not #{ret.class}"
    df.row[i] = ret
  end

  df
end

#recode_vectors(&block) ⇒ `Object`

# File 'lib/daru/dataframe.rb', line 692

def recode_vectors &block
  block_given? or return to_enum(:recode_vectors)

  df = self.dup
  df.each_vector_with_index do |v, i|
    ret = yield v
    ret.is_a?(Daru::Vector) or
      raise TypeError, "Every iteration must return Daru::Vector not #{ret.class}"
    df[*i] = ret
  end

  df
end

#reindex(new_index) ⇒ `Object`

Change the index of the DataFrame and preserve the labels of the previous indexing. New index can be Daru::Index or any of its subclasses.

Examples:

Reindexing DataFrame

df = Daru::DataFrame.new({a: [1,2,3,4], b: [11,22,33,44]},
  index: ['a','b','c','d'])
#=>
##<Daru::DataFrame:83278130 @name = b19277b8-c548-41da-ad9a-2ad8c060e273 @size = 4>
#                    a          b
#         a          1         11
#         b          2         22
#         c          3         33
#         d          4         44
df.reindex Daru::Index.new(['b', 0, 'a', 'g'])
#=>
##<Daru::DataFrame:83177070 @name = b19277b8-c548-41da-ad9a-2ad8c060e273 @size = 4>
#                    a          b
#         b          2         22
#         0        nil        nil
#         a          1         11
#         g        nil        nil

Parameters:

new_index (Daru::Index) —

The new Index for reindexing the DataFrame.

Raises:

(ArgumentError)

# File 'lib/daru/dataframe.rb', line 1305

def reindex new_index
  raise ArgumentError, "Must pass the new index of type Index or its "\
    "subclasses, not #{new_index.class}" unless new_index.kind_of?(Daru::Index)

  cl = Daru::DataFrame.new({}, order: @vectors, index: new_index, name: @name)
  new_index.each do |idx|
    if @index.include?(idx)
      cl.row[idx] = self.row[idx]
    else
      cl.row[idx] = [nil]*ncols
    end
  end

  cl
end

#reindex_vectors(new_vectors) ⇒ `Object`

Raises:

(ArgumentError)

# File 'lib/daru/dataframe.rb', line 1245

def reindex_vectors new_vectors
  raise ArgumentError, "Must pass the new index of type Index or its "\
    "subclasses, not #{new_index.class}" unless new_vectors.kind_of?(Daru::Index)

  cl = Daru::DataFrame.new({}, order: new_vectors, index: @index, name: @name)
  new_vectors.each do |vec|
    if @vectors.include?(vec)
      cl[vec] = self[vec]
    else
      cl[vec] = [nil]*nrows
    end
  end

  cl
end

#rename(new_name) ⇒ `Object`

Rename the DataFrame.



1848
1849
1850

# File 'lib/daru/dataframe.rb', line 1848

def rename new_name
  @name = new_name
end

#report_building(b) ⇒ `Object`

:nodoc: #

# File 'lib/daru/dataframe.rb', line 1399

def report_building(b) # :nodoc: #
  b.section(:name=>@name) do |g|
    g.text "Number of rows: #{nrows}"
    @vectors.each do |v|
      g.text "Element:[#{v}]"
      g.parse_element(self[v])
    end
  end
end

#row ⇒ `Object`

Access a row or set/create a row. Refer #[] and #[]= docs for details.

Usage

df.row[:a] # access row named ':a'
df.row[:b] = [1,2,3] # set row ':b' to [1,2,3]



421
422
423

# File 'lib/daru/dataframe.rb', line 421

def row
  Daru::Accessors::DataFrameByRow.new(self)
end

#save(filename) ⇒ `Object`

Use marshalling to save dataframe to a file.



1895
1896
1897

# File 'lib/daru/dataframe.rb', line 1895

def save filename
  Daru::IO.save self, filename
end

#set_index(new_index, opts = {}) ⇒ `Object`

Set a particular column as the new DF

Raises:

(ArgumentError)

# File 'lib/daru/dataframe.rb', line 1273

def set_index new_index, opts={}
  raise ArgumentError, "All elements in new index must be unique." if
    @size != self[new_index].uniq.size

  self.index = Daru::Index.new(self[new_index].to_a)
  self.delete_vector(new_index) unless opts[:keep]

  self
end

#shape ⇒ `Object`

Return the number of rows and columns of the DataFrame in an Array.



1103
1104
1105

# File 'lib/daru/dataframe.rb', line 1103

def shape
  [@index.size, @vectors.size]
end

#sort(vector_order, opts = {}) ⇒ `Object`

Non-destructive version of #sort!



1452
1453
1454

# File 'lib/daru/dataframe.rb', line 1452

def sort vector_order, opts={}
  self.dup.sort! vector_order, opts
end

#sort!(vector_order, opts = {}) ⇒ `Object`

Sorts a dataframe (ascending/descending)according to the given sequence of vectors, using the attributes provided in the blocks.

Usage

df = Daru::DataFrame.new({a: [-3,2,-1,4], b: [4,3,2,1]})

#<Daru::DataFrame:140630680 @name = 04e00197-f8d5-4161-bca2-93266bfabc6f @size = 4>
#            a          b
# 0         -3          4
# 1          2          3
# 2         -1          2
# 3          4          1
df.sort([:a], by: { a: lambda { |a,b| a.abs <=> b.abs } })

Parameters:

order (Array) —

The order of vector names in which the DataFrame should be sorted.
opts (Hash) (defaults to: {}) —

The options to sort with.

Options Hash (opts):

:ascending (TrueClass, FalseClass, Array) — default: true —

Sort in ascending or descending order. Specify Array corresponding to order for multiple sort orders.
:by (Hash) — default: {|a, b| a <=> b} —

Specify attributes of objects to to be used for sorting, for each vector name in order as a hash of vector name and lambda pairs. In case a lambda for a vector is not specified, the default will be used.

Raises:

(ArgumentError)

# File 'lib/daru/dataframe.rb', line 1434

def sort! vector_order, opts={}
  raise ArgumentError, "Required atleast one vector name" if vector_order.size < 1
  opts = {
    ascending: true,
    type: :quick_sort,
    by: {}
  }.merge(opts)

  opts[:by]        = create_logic_blocks vector_order, opts[:by]
  opts[:ascending] = sort_order_array vector_order, opts[:ascending]
  idx = @index.to_a
  send(opts[:type], vector_order, idx, opts[:by], opts[:ascending])
  self.index = Daru::Index.new(idx)

  self
end

#summary(method = :to_text) ⇒ `Object`

Generate a summary of this DataFrame with ReportBuilder.



1395
1396
1397

# File 'lib/daru/dataframe.rb', line 1395

def summary(method = :to_text)
  ReportBuilder.new(no_title: true).add(self).send(method)
end

#tail(quantity = 10) ⇒ `Object` Also known as: last

The last ten elements of the DataFrame

Parameters:

quantity (Fixnum) (defaults to: 10) —

(10) The number of elements to display from the bottom.



1178
1179
1180

# File 'lib/daru/dataframe.rb', line 1178

def tail quantity=10
  self[(@size - quantity)..(@size-1), :row]
end

#to_a ⇒ `Object`

Converts the DataFrame into an array of hashes where key is vector name and value is the corresponding element. The 0th index of the array contains the array of hashes while the 1th index contains the indexes of each row of the dataframe. Each element in the index array corresponds to its row in the array of hashes, which has the same index.

# File 'lib/daru/dataframe.rb', line 1761

def to_a
  arry = [[],[]]
  self.each_row do |row|
    arry[0] << row.to_hash
  end
  arry[1] = @index.to_a

  arry
end

#to_gsl ⇒ `Object`

Convert all numeric vectors to GSL::Matrix

# File 'lib/daru/dataframe.rb', line 1721

def to_gsl
  numerics_as_arrays = []
  numeric_vectors.each do |n|
    numerics_as_arrays << self[n].to_a
  end

  GSL::Matrix.alloc *numerics_as_arrays.transpose
end

#to_hash ⇒ `Object`

Converts DataFrame to a hash with keys as vector names and values as the corresponding vectors.

# File 'lib/daru/dataframe.rb', line 1783

def to_hash
  hsh = {}
  @vectors.each_with_index do |vec_name, idx|
    hsh[vec_name] = @data[idx]
  end

  hsh
end

#to_html(threshold = 30) ⇒ `Object`

Convert to html for IRuby.

# File 'lib/daru/dataframe.rb', line 1793

def to_html threshold=30
  html = "<table>" +
    "<tr>" +
      "<th colspan=\"#{@vectors.size+1}\">" +
        "Daru::DataFrame:#{self.object_id} " + " rows: #{nrows} " + " cols: #{ncols}"
      "</th>" +
    "</tr>"
  html +='<tr><th></th>'
  @vectors.each { |vector| html += '<th>' + vector.to_s + '</th>' }
  html += '</tr>'

  @index.each_with_index do |index, num|
    html += '<tr>'
    html += '<td>' + index.to_s + '</td>'

    self.row[index].each do |element|
      html += '<td>' + element.to_s + '</td>'
    end

    html += '</tr>'
    if num > threshold
      html += '<tr>'
      (@vectors.size + 1).times { html += '<td>...</td>' }
      html += '</tr>'

      last_index = @index.to_a.last
      last_row = self.row[last_index]
      html += '<tr>'
      html += "<td>" + last_index.to_s + "</td>"
      (0..(ncols - 1)).to_a.each do |i|
        html += '<td>' + last_row[i].to_s + '</td>'
      end
      html += '</tr>'
      break
    end
  end
  html += '</table>'

  html
end

#to_json(no_index = true) ⇒ `Object`

Convert to json. If no_index is false then the index will NOT be included in the JSON thus created.

# File 'lib/daru/dataframe.rb', line 1773

def to_json no_index=true
  if no_index
    self.to_a[0].to_json
  else
    self.to_a.to_json
  end
end

#to_matrix ⇒ `Object`

Convert all vectors of type :numeric into a Matrix.

# File 'lib/daru/dataframe.rb', line 1731

def to_matrix
  numerics_as_arrays = []
  each_vector do |vector|
    numerics_as_arrays << vector.to_a if(vector.type == :numeric)
  end

  Matrix.columns numerics_as_arrays
end

#to_nmatrix ⇒ `Object`

Convert all vectors of type :numeric and not containing nils into an NMatrix.

# File 'lib/daru/dataframe.rb', line 1746

def to_nmatrix
  numerics_as_arrays = []
  each_vector do |vector|
    numerics_as_arrays << vector.to_a if(vector.type == :numeric and
      vector.missing_positions.size == 0)
  end

  numerics_as_arrays.transpose.to_nm
end

#to_nyaplotdf ⇒ `Object`

Return a Nyaplot::DataFrame from the data of this DataFrame.



1741
1742
1743

# File 'lib/daru/dataframe.rb', line 1741

def to_nyaplotdf
  Nyaplot::DataFrame.new(to_a[0])
end

#to_REXP ⇒ `Object`

# File 'lib/daru/extensions/rserve.rb', line 5

def to_REXP
  names = @vectors.to_a
  data  = names.map do |f|
    Rserve::REXP::Wrapper.wrap(self[f].to_a)
  end
  l = Rserve::Rlist.new(data, names.map(&:to_s))

  Rserve::REXP.create_data_frame(l)
end

#to_s ⇒ `Object`



1834
1835
1836

# File 'lib/daru/dataframe.rb', line 1834

def to_s
  to_html
end

#transpose ⇒ `Object`

Transpose a DataFrame, tranposing elements and row, column indexing.

# File 'lib/daru/dataframe.rb', line 1928

def transpose
  arrys = []
  each_vector do |vec|
    arrys << vec.to_a
  end

  Daru::DataFrame.new(arrys.transpose, index: @vectors, order: @index, dtype: @dtype, name: @name)
end

#update ⇒ `Object`

Method for updating the metadata (i.e. missing value positions) of the after assingment/deletion etc. are complete. This is provided so that time is not wasted in creating the metadata for the vector each time assignment/deletion of elements is done. Updating data this way is called lazy loading. To set or unset lazy loading, see the .lazy_update= method.



1843
1844
1845

# File 'lib/daru/dataframe.rb', line 1843

def update
  @data.each { |v| v.update } if Daru.lazy_update
end

#vector(*args) ⇒ `Object`

# File 'lib/daru/dataframe.rb', line 353

def vector *args
  $stderr.puts "#vector has been deprecated in favour of #[]. Please use that."
  self[*names]
end

#vector_by_calculation(&block) ⇒ `Object`

DSL for yielding each row and returning a Daru::Vector based on the value each run of the block returns.

Usage

a1 = Daru::Vector.new([1, 2, 3, 4, 5, 6, 7])
a2 = Daru::Vector.new([10, 20, 30, 40, 50, 60, 70])
a3 = Daru::Vector.new([100, 200, 300, 400, 500, 600, 700])
ds = Daru::DataFrame.new({ :a => a1, :b => a2, :c => a3 })
total = ds.vector_by_calculation { a + b + c }
# <Daru::Vector:82314050 @name = nil @size = 7 >
#   nil
# 0 111
# 1 222
# 2 333
# 3 444
# 4 555
# 5 666
# 6 777

# File 'lib/daru/dataframe.rb', line 1008

def vector_by_calculation &block
  a = []
  each_row do |r|
    a.push r.instance_eval(&block)
  end

  Daru::Vector.new a, index: @index
end

#vector_count_characters(vecs = nil) ⇒ `Object`

# File 'lib/daru/dataframe.rb', line 1087

def vector_count_characters vecs=nil
  vecs ||= @vectors.to_a

  collect_row_with_index do |row, i|
    vecs.inject(0) do |memo, vec|
      memo + (row[vec].nil? ? 0 : row[vec].to_s.size)
    end
  end
end

#vector_mean(max_missing = 0) ⇒ `Object`

Calculate mean of the rows of the dataframe.

Arguments

max_missing - The maximum number of elements in the row that can be

zero for the mean calculation to happen. Default to 0.

# File 'lib/daru/dataframe.rb', line 1203

def vector_mean max_missing=0
  mean_vec = Daru::Vector.new [0]*@size, index: @index, name: "mean_#{@name}"

  each_row_with_index do |row, i|
    mean_vec[i] = row.missing_positions.size > max_missing ? nil : row.mean
  end

  mean_vec
end

#vector_sum(vecs = nil) ⇒ `Object`

Returns a vector with sum of all vectors specified in the argument. Tf vecs parameter is empty, sum all numeric vector.

# File 'lib/daru/dataframe.rb', line 1186

def vector_sum vecs=nil
  vecs ||= numeric_vectors
  sum = Daru::Vector.new [0]*@size, index: @index, name: @name, dtype: @dtype

  vecs.each do |n|
    sum += self[n]
  end

  sum
end

#verify(*tests) ⇒ `Object`

Test each row with one or more tests. Each test is a Proc with the form *Proc.new {|row| row > 0}*

The function returns an array with all errors.

# File 'lib/daru/dataframe.rb', line 964

def verify(*tests)
  if(tests[0].is_a? Symbol)
    id = tests[0]
    tests.shift
  else
    id = @vectors.first
  end

  vr = []
  i  = 0
  each(:row) do |row|
    i += 1
    tests.each do |test|
      if !test[2].call(row)
        values = ""
        if test[1].size>0
          values = " (" + test[1].collect{ |k| "#{k}=#{row[k]}" }.join(", ") + ")"
        end
        vr.push("#{i} [#{row[id]}]: #{test[0]}#{values}")
      end
    end
  end
  vr
end

#where(bool_array) ⇒ `Object`

Query a DataFrame by passing a Daru::Core::Query::BoolArray object.



1972
1973
1974

# File 'lib/daru/dataframe.rb', line 1972

def where bool_array
  Daru::Core::Query.df_where self, bool_array
end

#write_csv(filename, opts = {}) ⇒ `Object`

Write this DataFrame to a CSV file.

Arguements

filename - Path of CSV file where the DataFrame is to be saved.

Options

convert_comma - If set to true, will convert any commas in any

of the data to full stops (‘.’). All the options accepted by CSV.read() can also be passed into this function.



1864
1865
1866

# File 'lib/daru/dataframe.rb', line 1864

def write_csv filename, opts={}
  Daru::IO.dataframe_write_csv self, filename, opts
end

#write_excel(filename, opts = {}) ⇒ `Object`

Write this dataframe to an Excel Spreadsheet

Arguments

filename - The path of the file where the DataFrame should be written.



1873
1874
1875

# File 'lib/daru/dataframe.rb', line 1873

def write_excel filename, opts={}
  Daru::IO.dataframe_write_excel self, filename, opts
end

#write_sql(dbh, table) ⇒ `Object`

Insert each case of the Dataset on the selected table

Arguments

dbh - DBI database connection object.
query - Query string.

Usage

ds = Daru::DataFrame.new({:id=>Daru::Vector.new([1,2,3]), :name=>Daru::Vector.new(["a","b","c"])})
dbh = DBI.connect("DBI:Mysql:database:localhost", "user", "password")
ds.write_sql(dbh,"test")



1889
1890
1891

# File 'lib/daru/dataframe.rb', line 1889

def write_sql dbh, table
  Daru::IO.dataframe_write_sql self, dbh, table
end

Class: Daru::DataFrame

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Plotting::DataFrame

Methods included from Maths::Statistics::DataFrame

Methods included from Maths::Arithmetic::DataFrame

Constructor Details

#initialize(source, opts = {}) ⇒ DataFrame

Arguments

Options

Usage

Dynamic Method Handling

#method_missing(name, *args, &block) ⇒ Object

Instance Attribute Details

#index ⇒ Object

#name ⇒ Object (readonly)

#size ⇒ Object (readonly)

#vectors ⇒ Object

Class Method Details

._load(data) ⇒ Object

.crosstab_by_assignation(rows, columns, values) ⇒ Object

.from_activerecord(relation, *fields) ⇒ Object

.from_csv(path, opts = {}, &block) ⇒ Object

Arguments

Options

Verbose Description

.from_excel(path, opts = {}, &block) ⇒ Object

Arguments

Options

.from_plaintext(path, fields) ⇒ Object

Arguments

Usage

.from_sql(dbh, query) ⇒ Object

.rows(source, opts = {}) ⇒ Object

Instance Method Details

#==(other) ⇒ Object

#[](*names) ⇒ Object

#[]=(*args) ⇒ Object

#_dump(depth) ⇒ Object

#add_row(row, index = nil) ⇒ Object

#add_vector(n, vector) ⇒ Object

#add_vectors_by_split(name, join = '-', sep = Daru::SPLIT_TOKEN) ⇒ Object

#add_vectors_by_split_recode(name_, join = '-', sep = Daru::SPLIT_TOKEN) ⇒ Object

#all?(axis = :vector, &block) ⇒ Boolean

#any?(axis = :vector, &block) ⇒ Boolean

#bootstrap(n = nil) ⇒ Daru::DataFrame

#clone(*vectors_to_clone) ⇒ Object

Arguments

#clone_only_valid ⇒ Object

#clone_structure ⇒ Object

#collect(axis = :vector, &block) ⇒ Object

Description

Arguments

#collect_matrix ⇒ ::Matrix

#collect_row_with_index(&block) ⇒ Object

#collect_rows(&block) ⇒ Object

#collect_vector_with_index(&block) ⇒ Object

#collect_vectors(&block) ⇒ Object

#column(name) ⇒ Object

#compute(text, &block) ⇒ Object

#concat(other_df) ⇒ Object

#create_sql(table, charset = "UTF8") ⇒ Object

Arguments

#delete_row(index) ⇒ Object

#delete_vector(vector) ⇒ Object

#dup(vectors_to_dup = nil) ⇒ Object

Arguments

#dup_only_valid(vecs = nil) ⇒ Object

#each(axis = :vector, &block) ⇒ Object

Description

Arguments

#each_index(&block) ⇒ Object

#each_row(&block) ⇒ Object

#each_row_with_index(&block) ⇒ Object

#each_vector(&block) ⇒ Object Also known as: each_column

#each_vector_with_index(&block) ⇒ Object Also known as: each_column_with_index

#filter(axis = :vector, &block) ⇒ Object

Description

Arguments

#initialize(source, opts = {}) ⇒ `DataFrame`

#method_missing(name, *args, &block) ⇒ `Object`

#index ⇒ `Object`

#name ⇒ `Object` (readonly)

#size ⇒ `Object` (readonly)

#vectors ⇒ `Object`

._load(data) ⇒ `Object`

.crosstab_by_assignation(rows, columns, values) ⇒ `Object`

.from_activerecord(relation, *fields) ⇒ `Object`

.from_csv(path, opts = {}, &block) ⇒ `Object`

.from_excel(path, opts = {}, &block) ⇒ `Object`

.from_plaintext(path, fields) ⇒ `Object`

.from_sql(dbh, query) ⇒ `Object`

.rows(source, opts = {}) ⇒ `Object`

#==(other) ⇒ `Object`

#[](*names) ⇒ `Object`

#[]=(*args) ⇒ `Object`

#_dump(depth) ⇒ `Object`

#add_row(row, index = nil) ⇒ `Object`

#add_vector(n, vector) ⇒ `Object`

#add_vectors_by_split(name, join = '-', sep = Daru::SPLIT_TOKEN) ⇒ `Object`

#add_vectors_by_split_recode(name_, join = '-', sep = Daru::SPLIT_TOKEN) ⇒ `Object`

#all?(axis = :vector, &block) ⇒ `Boolean`

#any?(axis = :vector, &block) ⇒ `Boolean`

#bootstrap(n = nil) ⇒ `Daru::DataFrame`

#clone(*vectors_to_clone) ⇒ `Object`

#clone_only_valid ⇒ `Object`

#clone_structure ⇒ `Object`

#collect(axis = :vector, &block) ⇒ `Object`

#collect_matrix ⇒ `::Matrix`

#collect_row_with_index(&block) ⇒ `Object`

#collect_rows(&block) ⇒ `Object`

#collect_vector_with_index(&block) ⇒ `Object`

#collect_vectors(&block) ⇒ `Object`

#column(name) ⇒ `Object`

#compute(text, &block) ⇒ `Object`

#concat(other_df) ⇒ `Object`

#create_sql(table, charset = "UTF8") ⇒ `Object`

#delete_row(index) ⇒ `Object`

#delete_vector(vector) ⇒ `Object`

#dup(vectors_to_dup = nil) ⇒ `Object`

#dup_only_valid(vecs = nil) ⇒ `Object`

#each(axis = :vector, &block) ⇒ `Object`

#each_index(&block) ⇒ `Object`

#each_row(&block) ⇒ `Object`

#each_row_with_index(&block) ⇒ `Object`

#each_vector(&block) ⇒ `Object` Also known as: each_column

#each_vector_with_index(&block) ⇒ `Object` Also known as: each_column_with_index

#filter(axis = :vector, &block) ⇒ `Object`

#filter_rows(&block) ⇒ `Object`

#filter_vector(vec) ⇒ `Object`

#filter_vectors(&block) ⇒ `Object`

#group_by(*vectors) ⇒ `Object`

#has_missing_data? ⇒ `Boolean` Also known as: flawed?

#has_vector?(vector) ⇒ `Boolean`

#head(quantity = 10) ⇒ `Object` Also known as: first

#inspect(spacing = 10, threshold = 15) ⇒ `Object`

#join(other_df, opts = {}) ⇒ `Daru::DataFrame`

#keep_row_if(&block) ⇒ `Object`

#keep_vector_if(&block) ⇒ `Object`

#map(axis = :vector, &block) ⇒ `Object`

#map!(axis = :vector, &block) ⇒ `Object`

#map_rows(&block) ⇒ `Object`

#map_rows!(&block) ⇒ `Object`

#map_rows_with_index(&block) ⇒ `Object`

#map_vectors(&block) ⇒ `Object`

#map_vectors!(&block) ⇒ `Object`

#map_vectors_with_index(&block) ⇒ `Object`

#merge(other_df) ⇒ `Daru::DataFrame`

#missing_values_rows(missing_values = [nil]) ⇒ `Object` Also known as: vector_missing_values

#ncols ⇒ `Object`

#nest(*tree_keys, &block) ⇒ `Object`

#nrows ⇒ `Object`

#numeric_vector_names ⇒ `Object`

#numeric_vectors ⇒ `Object`

#one_to_many(parent_fields, pattern) ⇒ `Object`

#only_numerics(opts = {}) ⇒ `Object`

#pivot_table(opts = {}) ⇒ `Object`

#recast(opts = {}) ⇒ `Object`

#recode(axis = :vector, &block) ⇒ `Object`