Class: Sycsvpro::Merger

Inherits:
Object
  • Object
show all
Includes:
Dsl
Defined in:
lib/sycsvpro/merger.rb

Overview

Merge files based on common header columns

file1.csv

| | 2010 | 2011 | 2012 | 2013 | | — | —- | —- | —- | —- | | SP | 20 | 30 | 40 | 50 | | RP | 30 | 40 | 50 | 60 |

file2.csv

| | 2010 | 2011 | 2012 | | — | —- | —- | —- | | M | m1 | m2 | m3 | | N | n1 | n2 | n3 |

merging restults in

merge.csv

| | 2010 | 2011 | 2012 | 2013 | | — | —- | —- | —- | —- | | SP | 20 | 30 | 40 | 50 | | RP | 30 | 40 | 50 | 60 | | M | m1 | m2 | m3 | | | N | n1 | n2 | n3 | |

Constant Summary

Constants included from Dsl

Dsl::COMMA_SPLITTER_REGEX

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Dsl

#clean_up, #params, #rows, #split_by_comma_regex, #str2utf8, #unstring, #write_to

Constructor Details

#initialize(options = {}) ⇒ Merger

Merge files based on common header columns

:call-seq:

Sycsvpro::Merger.new(outfile:       "out.csv",
                     files:         "file1.csv,file2.csv,filen.csv",
                     header:        "2010,2011,2012,2013,2014",
                     source_header: "(\\d{4}/),(/\\d{4}/)",
                     key:           "0,0").execute

Semantics

Merges the files file1.csv, file2.csv … based on the header columns 2010, 2011, 2012, 2013 and 2014 where columns are identified by the regex /(d4)/. The first column in a row is column 0 of the file1.csv and so on.

outfile

result is written to the outfile

files

list of files that get merged. In the result file the files are

inserted in the sequence they are provided

header

header of the result file and key for assigning column values

from source files to result file

source_header

pattern for each header of the source file to determine

the column. The pattern is a regex without the enclosing slashes ‘/’

key

first column value from the source file that is used as first

column in the target file. The key is optional.



73
74
75
76
77
78
79
80
81
82
83
# File 'lib/sycsvpro/merger.rb', line 73

def initialize(options = {})
  @outfile       = options[:outfile]
  @header_cols   = options[:header].split(',')
  @source_header = options[:source_header].split(',')
  @key           = options[:key] ? options[:key].split(',') : []
  @has_key       = !@key.empty?
  @files         = options[:files].split(',')
  if @source_header.count != @files.count
    raise "file count has to be equal to source_header count"
  end
end

Instance Attribute Details

#filesObject (readonly)

files to be merged based on header columns



44
45
46
# File 'lib/sycsvpro/merger.rb', line 44

def files
  @files
end

#header_colsObject (readonly)

header columns



40
41
42
# File 'lib/sycsvpro/merger.rb', line 40

def header_cols
  @header_cols
end

#keyObject (readonly)

value that is used as first of column of a row



42
43
44
# File 'lib/sycsvpro/merger.rb', line 42

def key
  @key
end

#outfileObject (readonly)

file to that the result is written to



36
37
38
# File 'lib/sycsvpro/merger.rb', line 36

def outfile
  @outfile
end

#source_headerObject (readonly)

header patterns to be used to identify merge columns



38
39
40
# File 'lib/sycsvpro/merger.rb', line 38

def source_header
  @source_header
end

Instance Method Details

#executeObject

Merges the files based on the provided parameters



86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
# File 'lib/sycsvpro/merger.rb', line 86

def execute
  File.open(outfile, 'w') do |out|
    out.puts "#{';' unless @key.empty?}#{header_cols.join(';')}"
    files.each do |file|
      @current_key = create_current_key
      @current_source_header = @source_header.shift
      processed_header = false
      File.open(file).each_with_index do |line, index|
        next if line.chomp.empty?

        unless processed_header
          create_file_header unstring(line).split(';')
          processed_header = true
          next
        end

        out.puts create_line unstring(line).split(';')
      end
    end
  end
end