Large CSV Reader GEM

This gem is created to help with a known issue, processing large csv files using ruby, without breaking our RAM. I used lazy enumeration to achieve this, that allows the methods to execute actions on one line at a time instead of loading millions of lines in memory at once.

I consider this gem in Beta state, proper testing suite is still missing and there are some improvements or extensions that could be useful to have in a full version of it.

Installation

gem install large_csv_reader

  • In your files: require 'large_csv_reader'

Usage

General Methods

Function Description
reader = LargeCsvReader.new creates a new instance of the reader
reader.generate_csv(fileName, columnNames) creates a new csv file with the name and header names passed as parameters
reader.append_to_csv(filename, rows=1000000,rowStructure) Add lines to the csv, this lines are generated with the rowStructure array parameter. If rows parameters is not present by default it will load 1 million lines to the file.
reader.massive_read_in_csv_data (file_name) lazy load of each csv row into a list
reader.massive_csv_builder(filename, column_names,rowMult="1") create a csv with millions of lines, the value of rowMult represents how many millions lines the file will have
reader.row_generator(structure) generate rows on demand using enumeration

Specific Methods

The rest of the methods are considerations to solve a test problem with book data "Date", "ISBN", "Price"

  • massive_total_value_in_stock(csv_file_name)
  • massive_number_of_each_isbn(csv_file_name)
  • append_book_to_csv(filename,rows=1000000)
  • book_generator