Class: Sycsvpro::Join

Inherits:
Object
  • Object
show all
Includes:
Dsl
Defined in:
lib/sycsvpro/join.rb

Overview

Join joins two files based on a join key value. Example File 1 (infile)

|Name |ID |
|-----|---|
|Hank |123|
|Frank|234|
|Mia  |345|
|Moira|234|

File 2 (source)

|Company|Phone|ID |
|-------|-----|---|
|Siem   |4848 |123|
|Helo   |993  |345|
|Wara   |3333 |234|

File 3 (outfile)

|Name |ID |Company|Phone|
|-----|---|-------|-----|
|Hank |123|Siem   |4848 |
|Frank|234|Wara   |3333 |
|Mia  |345|Helo   |993  |
|Moira|234|Wara   |3333 |

Constant Summary

Constants included from Dsl

Dsl::COMMA_SPLITTER_REGEX

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Dsl

#clean_up, #params, #rows, #split_by_comma_regex, #str2utf8, #unstring, #write_to

Constructor Details

#initialize(options = {}) ⇒ Join

Creates a Join which can be invoked as follows :call-seq:

Sycsvpro::Join.new(infile:  "in.csv",
                   outfile: "out.csv",
                   source:  "source.csv",
                   rows:    "1-eof",
                   cols:    "0,1",
                   pos:     "2,3",
                   joins:   "2=1",
                   headerless: true,
                   header:  "*",
                   insert_header: "Company,Phone").execute
infile

csv file to operate on

outfile

csv file with the result

source

csv file that contains the values to join to infile

rows

rows to consider for operation. Rows that don’t match the pattern will be skipped for operation

cols

columns to insert from the source to the infile

pos

column positions where to insert the values and the insert_header columns

joins

columns that match in infile and source. source_column=infile_column

headerless

indicates whether the infile has a header (default true)

header

Header of the csv file

insert_header

column names of the to be inserted values



80
81
82
83
84
85
86
87
88
89
90
91
92
93
# File 'lib/sycsvpro/join.rb', line 80

def initialize(options = {})
  @infile     = options[:infile]
  @outfile    = options[:outfile]
  @source     = options[:source]
  @row_filter = RowFilter.new(options[:rows], df: options[:df])
  @positions  = create_joiners(options[:joins], 
                               options[:cols], 
                               options[:pos])
  @headerless = options[:headerless].nil? ? false : options[:headerless]
  @header     = Header.new(options[:header] || '*', 
                           pos:    @positions, 
                           insert: options[:insert_header])
  create_lookup_table
end

Instance Attribute Details

#headerObject (readonly)

header of the outfile



50
51
52
# File 'lib/sycsvpro/join.rb', line 50

def header
  @header
end

#headerlessObject (readonly)

indicates whether the infile is headerless



52
53
54
# File 'lib/sycsvpro/join.rb', line 52

def headerless
  @headerless
end

#infileObject (readonly)

infile contains the data that is operated on



40
41
42
# File 'lib/sycsvpro/join.rb', line 40

def infile
  @infile
end

#outfileObject (readonly)

outfile is the file where the result is written to



42
43
44
# File 'lib/sycsvpro/join.rb', line 42

def outfile
  @outfile
end

#positionsObject (readonly)

posititon where to insert the columns into the infile



48
49
50
# File 'lib/sycsvpro/join.rb', line 48

def positions
  @positions
end

#row_filterObject (readonly)

filter that is used for rows



46
47
48
# File 'lib/sycsvpro/join.rb', line 46

def row_filter
  @row_filter
end

#sourceObject (readonly)

source file from where columns are inserted into infile



44
45
46
# File 'lib/sycsvpro/join.rb', line 44

def source
  @source
end

Instance Method Details

#executeObject

Executes the join



96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
# File 'lib/sycsvpro/join.rb', line 96

def execute
  processed_header = headerless ? true : false

  File.open(outfile, 'w') do |out|
    File.open(infile).each_with_index do |line, index|
      line = line.chomp

      next if line.empty?

      line = unstring(line).chomp

      unless processed_header
        header_line = header.process(line)
        out.puts header unless header_line.empty?
        processed_header = true
        next
      end

      next if row_filter.process(line, row: index).nil?

      values = line.split(';')
      target = values.dup

      @positions.sort.each { |p| target.insert(p, "") }
      
      @joiners.each do |joiner|
        key = values[joiner.join[1]]
        row = joiner.lookup[:rows][key] || []
        joiner.pos.each_with_index { |p,i| target[p] = row[i] }
      end

      out.puts target.join(';')
    end
  end
end