Class: Remi::Parser::CsvFile

Inherits:
Remi::Parser show all
Includes:
DataSubject::CsvFile
Defined in:
lib/remi/data_subjects/csv_file.rb

Overview

CsvFile parser

Examples:


class MyJob < Remi::Job
  source :some_file do
    extractor Remi::Extractor::LocalFile.new(
      remote_path: 'some_file.csv'
    )
    parser Remi::Parser::CsvFile.new(
      csv_options: {
        headers: true,
        col_sep: '|'
      }
    )
  end
end

job = MyJob.new
job.some_file.df
# =>#<Daru::DataFrame:70153153438500 @name = 4c59cfdd-7de7-4264-8666-83153f46a9e4 @size = 3>
#                    id       name
#          0          1     Albert
#          1          2      Betsy
#          2          3       Camu

Instance Attribute Summary collapse

Attributes inherited from Remi::Parser

#context, #field_symbolizer, #fields, #logger

Instance Method Summary collapse

Constructor Details

#initialize(*args, **kargs, &block) ⇒ CsvFile

Returns a new instance of CsvFile.

Parameters:

  • csv_options (Hash)

    Standard Ruby CSV parsing options.

  • filename_field (Symbol)

    Name of the field to be used to write the filename of the CSV being parsed (default: nil, meaning no field will be used)

  • preprocessor (Proc)

    A proc used to pre-process lines of the CSV file before being parsed



59
60
61
62
# File 'lib/remi/data_subjects/csv_file.rb', line 59

def initialize(*args, **kargs, &block)
  super
  init_csv_file(*args, **kargs, &block)
end

Instance Attribute Details

#csv_optionsHash (readonly)

Returns Csv options hash.

Returns:

  • (Hash)

    Csv options hash



65
66
67
# File 'lib/remi/data_subjects/csv_file.rb', line 65

def csv_options
  @csv_options
end

Instance Method Details

#parse(data) ⇒ Remi::DataFrame

Converts a list of filenames into a dataframe after parsing them according ot the csv options that were set

Parameters:

  • data (Object)

    Extracted data that needs to be parsed

Returns:



71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# File 'lib/remi/data_subjects/csv_file.rb', line 71

def parse(data)
  # Assumes that each file has exactly the same structure
  result_df = nil
  Array(data).each_with_index do |filename, idx|
    filename = filename.to_s

    logger.info "Converting #{filename} to a dataframe"
    processed_filename = preprocess(filename)
    csv_df = Daru::DataFrame.from_csv processed_filename, @csv_options

    csv_df[@filename_field] = Daru::Vector.new([filename] * csv_df.size, index: csv_df.index) if @filename_field
    if idx == 0
      result_df = csv_df
    else
      result_df = result_df.concat csv_df
    end
  end

  Remi::DataFrame.create(:daru, result_df)
end