Class: Daru::IO::Importers::CSV

Inherits:
Base
  • Object
show all
Defined in:
lib/daru/io/importers/csv.rb

Overview

CSV Importer Class, that extends read_csv method to Daru::DataFrame

Constant Summary collapse

CONVERTERS =
{
  boolean: lambda { |f, _|
    case f.downcase.strip
    when 'true'  then true
    when 'false' then false
    else f
    end
  }
}.freeze

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Base

#optional_gem

Constructor Details

#initializeCSV

Checks for required gem dependencies of CSV Importer



21
22
23
24
25
# File 'lib/daru/io/importers/csv.rb', line 21

def initialize
  require 'csv'
  require 'open-uri'
  require 'zlib'
end

Class Method Details

.read(path) ⇒ Daru::IO::Importers::CSV

Reads data from a csv / csv.gz file

Examples:

Reading from csv file

instance = Daru::IO::Importers::CSV.read("matrix_test.csv")

Reading from csv.gz file

instance = Daru::IO::Importers::CSV.read("matrix_test.csv.gz")

Parameters:

  • path (String)

    Path to csv / csv.gz file, where the dataframe is to be imported from.

Returns:



41
42
43
44
45
# File 'lib/daru/io/importers/csv.rb', line 41

def read(path)
  @path      = path
  @file_data = open(@path)
  self
end

Instance Method Details

#call(headers: nil, skiprows: 0, compression: :infer, clone: nil, index: nil, order: nil, name: nil, **options) ⇒ Daru::DataFrame

Imports a Daru::DataFrame from a CSV Importer instance

Examples:

Calling with csv options

df = instance.call(col_sep: ' ', headers: true)

#=> #<Daru::DataFrame(99x3)>
#        image_reso        mls true_trans
#      0    6.55779          0 -0.2362347
#      1    2.14746          0 -0.1539447
#      2    8.31104          0 0.3832846,
#      3    3.47872          0 0.3832846,
#      4    4.16725          0 -0.2362347
#      5    5.79983          0 -0.2362347
#      6     1.9058          0 -0.895577,
#      7     1.9058          0 -0.2362347
#      8    4.11806          0 -0.895577,
#      9    6.26622          0 -0.2362347
#     10    2.57805          0 -0.1539447
#     11    4.76151          0 -0.2362347
#     12    7.11002          0 -0.895577,
#     13    5.40811          0 -0.2362347
#     14    8.19567          0 -0.1539447
#    ...        ...        ...        ...

Calling with csv.gz options

df = instance.call(compression: :gzip, col_sep: ' ', headers: true)

#=> #<Daru::DataFrame(99x3)>
#        image_reso        mls true_trans
#      0    6.55779          0 -0.2362347
#      1    2.14746          0 -0.1539447
#      2    8.31104          0 0.3832846,
#      3    3.47872          0 0.3832846,
#      4    4.16725          0 -0.2362347
#      5    5.79983          0 -0.2362347
#      6     1.9058          0 -0.895577,
#      7     1.9058          0 -0.2362347
#      8    4.11806          0 -0.895577,
#      9    6.26622          0 -0.2362347
#     10    2.57805          0 -0.1539447
#     11    4.76151          0 -0.2362347
#     12    7.11002          0 -0.895577,
#     13    5.40811          0 -0.2362347
#     14    8.19567          0 -0.1539447
#    ...        ...        ...        ...

Parameters:

  • headers (Boolean) (defaults to: nil)

    If this option is true, only those columns will be used to import the Daru::DataFrame whose header is given.

  • skiprows (Integer) (defaults to: 0)

    Skips the first :skiprows number of rows from the CSV file. Defaults to 0.

  • compression (Symbol) (defaults to: :infer)

    Defaults to :infer, to parse depending on file format like .csv.gz. For explicitly parsing data from a .csv.gz file, set :compression as :gzip.

  • clone (Boolean) (defaults to: nil)

    Have a look at :clone option here

  • index (Array or Daru::Index or Daru::MultiIndex) (defaults to: nil)

    Have a look at :index option here

  • order (Array or Daru::Index or Daru::MultiIndex) (defaults to: nil)

    Have a look at :order option here

  • name (String) (defaults to: nil)

    Have a look at :name option here

  • options (Hash)

    CSV standard library options such as :col_sep (defaults to ','), :converters (defaults to :numeric), :header_converters (defaults to :symbol).

Returns:



115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
# File 'lib/daru/io/importers/csv.rb', line 115

def call(headers: nil, skiprows: 0, compression: :infer,
  clone: nil, index: nil, order: nil, name: nil, **options)
  init_opts(headers: headers, skiprows: skiprows, compression: compression,
            clone: clone, index: index, order: order, name: name, **options)
  process_compression

  # Preprocess headers for detecting and correcting repetition in
  # case the :headers option is not specified.
  hsh =
    if @headers
      hash_with_headers
    else
      hash_without_headers.tap { |hash| @daru_options[:order] = hash.keys }
    end

  Daru::DataFrame.new(hsh, @daru_options)
end

#read(path) ⇒ Daru::IO::Importers::CSV

Reads data from a csv / csv.gz file

Examples:

Reading from csv file

instance = Daru::IO::Importers::CSV.read("matrix_test.csv")

Reading from csv.gz file

instance = Daru::IO::Importers::CSV.read("matrix_test.csv.gz")

Parameters:

  • path (String)

    Path to csv / csv.gz file, where the dataframe is to be imported from.

Returns:



41
42
43
44
45
# File 'lib/daru/io/importers/csv.rb', line 41

def read(path)
  @path      = path
  @file_data = open(@path)
  self
end