Class: Sycsvpro::Merger
Overview
Merge files based on common header columns
file1.csv
| | 2010 | 2011 | 2012 | 2013 | | — | —- | —- | —- | —- | | SP | 20 | 30 | 40 | 50 | | RP | 30 | 40 | 50 | 60 |
file2.csv
| | 2010 | 2011 | 2012 | | — | —- | —- | —- | | M | m1 | m2 | m3 | | N | n1 | n2 | n3 |
merging restults in
merge.csv
| | 2010 | 2011 | 2012 | 2013 | | — | —- | —- | —- | —- | | SP | 20 | 30 | 40 | 50 | | RP | 30 | 40 | 50 | 60 | | M | m1 | m2 | m3 | | | N | n1 | n2 | n3 | |
Constant Summary
Constants included from Dsl
Instance Attribute Summary collapse
-
#files ⇒ Object
readonly
files to be merged based on header columns.
-
#header_cols ⇒ Object
readonly
header columns.
-
#key ⇒ Object
readonly
value that is used as first of column of a row.
-
#outfile ⇒ Object
readonly
file to that the result is written to.
-
#source_header ⇒ Object
readonly
header patterns to be used to identify merge columns.
Instance Method Summary collapse
-
#execute ⇒ Object
Merges the files based on the provided parameters.
-
#initialize(options = {}) ⇒ Merger
constructor
Merge files based on common header columns.
Methods included from Dsl
#clean_up, #params, #rows, #split_by_comma_regex, #str2utf8, #unstring, #write_to
Constructor Details
#initialize(options = {}) ⇒ Merger
Merge files based on common header columns
:call-seq:
Sycsvpro::Merger.new(outfile: "out.csv",
files: "file1.csv,file2.csv,filen.csv",
header: "2010,2011,2012,2013,2014",
source_header: "(\\d{4}/),(/\\d{4}/)",
key: "0,0").execute
Semantics
Merges the files file1.csv, file2.csv … based on the header columns 2010, 2011, 2012, 2013 and 2014 where columns are identified by the regex /(d4)/. The first column in a row is column 0 of the file1.csv and so on.
- outfile
-
result is written to the outfile
- files
-
list of files that get merged. In the result file the files are
inserted in the sequence they are provided
- header
-
header of the result file and key for assigning column values
from source files to result file
- source_header
-
pattern for each header of the source file to determine
the column. The pattern is a regex without the enclosing slashes ‘/’
- key
-
first column value from the source file that is used as first
column in the target file. The key is optional.
73 74 75 76 77 78 79 80 81 82 83 |
# File 'lib/sycsvpro/merger.rb', line 73 def initialize( = {}) @outfile = [:outfile] @header_cols = [:header].split(',') @source_header = [:source_header].split(',') @key = [:key] ? [:key].split(',') : [] @has_key = !@key.empty? @files = [:files].split(',') if @source_header.count != @files.count raise "file count has to be equal to source_header count" end end |
Instance Attribute Details
#files ⇒ Object (readonly)
files to be merged based on header columns
44 45 46 |
# File 'lib/sycsvpro/merger.rb', line 44 def files @files end |
#header_cols ⇒ Object (readonly)
header columns
40 41 42 |
# File 'lib/sycsvpro/merger.rb', line 40 def header_cols @header_cols end |
#key ⇒ Object (readonly)
value that is used as first of column of a row
42 43 44 |
# File 'lib/sycsvpro/merger.rb', line 42 def key @key end |
#outfile ⇒ Object (readonly)
file to that the result is written to
36 37 38 |
# File 'lib/sycsvpro/merger.rb', line 36 def outfile @outfile end |
#source_header ⇒ Object (readonly)
header patterns to be used to identify merge columns
38 39 40 |
# File 'lib/sycsvpro/merger.rb', line 38 def source_header @source_header end |
Instance Method Details
#execute ⇒ Object
Merges the files based on the provided parameters
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
# File 'lib/sycsvpro/merger.rb', line 86 def execute File.open(outfile, 'w') do |out| out.puts "#{';' unless @key.empty?}#{header_cols.join(';')}" files.each do |file| @current_key = create_current_key @current_source_header = @source_header.shift processed_header = false File.open(file).each_with_index do |line, index| next if line.chomp.empty? unless processed_header create_file_header unstring(line).split(';') processed_header = true next end out.puts create_line unstring(line).split(';') end end end end |