Class: FileProcessor::CSV

Inherits:
SimpleDelegator
  • Object
show all
Includes:
Enumerable
Defined in:
lib/file_processor/csv.rb

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(filename, options = {}) ⇒ CSV

Returns a new instance of CSV.



22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# File 'lib/file_processor/csv.rb', line 22

def initialize(filename, options={})
  @gzipped      = options.delete(:gzipped)

  load(filename, options.delete(:open_options))

  @options      = default_options.merge(options)

  @options[:encoding] ||= detect_encoding
  @detected_encoding  ||= Encoding.find(@options[:encoding])

  tempfile.reopen(detected_mode) if tempfile.closed?

  @options[:col_sep]  ||= detect_column_separator

  super(::CSV.new(tempfile, @options))
end

Instance Attribute Details

#detected_encodingObject

Returns the value of attribute detected_encoding.



20
21
22
# File 'lib/file_processor/csv.rb', line 20

def detected_encoding
  @detected_encoding
end

Class Method Details

.open(*args) ⇒ Object

Opens a file and yields it, ensuring that it is properly closed.



6
7
8
9
10
11
12
13
14
15
16
17
18
# File 'lib/file_processor/csv.rb', line 6

def self.open(*args)
  instance = new(*args)

  if block_given?
    begin
      yield instance
    ensure
      instance.close if instance
    end
  else
    instance
  end
end

Instance Method Details

#eachObject

Yields each row of the data source in turn, skipping blanks and rows with no data.

Support for Enumerable.

The data source must be open for reading.



56
57
58
59
60
61
62
63
64
# File 'lib/file_processor/csv.rb', line 56

def each
  if block_given?
    while row = shift
      yield row unless skip_blanks? && row_with_no_data?(row)
    end
  else
    to_enum
  end
end

#gzipped?Boolean

Returns true when the file is gzipped, false otherwise

Returns:

  • (Boolean)


97
98
99
# File 'lib/file_processor/csv.rb', line 97

def gzipped?
  @gzipped
end

#process_range(options = {}) ⇒ Enumerable

Process a range of lines in the CSV file.

Examples:

Process 1000 lines starting from the line 2000

csv.process_range(offset: 2000, limit: 1000) do |row, index|
  # process range here
end

Parameters:

  • options (Hash) (defaults to: {})

    A hash with offset and/or limit

Options Hash (options):

  • :offset (Integer)

    The offset from which the process should start

  • :limit (Integer)

    The number of rows to process

Returns:

  • (Enumerable)

    CSV’s enumerable



79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# File 'lib/file_processor/csv.rb', line 79

def process_range(options={})
  options ||= {}

  offset = options[:offset] || 0
  limit  = options[:limit]  || -1

  rewind
  each_with_index do |row, index|
    next if index < offset
    break if limit >= 0 && index >= offset + limit

    yield row, index
  end
ensure
  rewind
end

#total_count(&block) ⇒ Integer

Counts the number of rows in the file, even if it has already been read

Returns:

  • (Integer)

    the number of rows in the file



42
43
44
45
46
47
# File 'lib/file_processor/csv.rb', line 42

def total_count(&block)
  rewind
  count(&block)
ensure
  rewind
end