Class: Sycsvpro::Unique

Inherits:
Object
  • Object
show all
Includes:
Dsl
Defined in:
lib/sycsvpro/unique.rb

Overview

Removes copies of rows identified by key values

| Name | Street | Town | Country | | —- | —— | —- | ——- | | Jane | Canal | Win | CA | | Jack | Long | Van | CA | | Jean | Sing | Ma | DE | | Jane | Canal | Win | CA |

Remove copies based on column 0 (Name)

| Name | Street | Town | Country | | —- | —— | —- | ——- | | Jane | Canal | Win | CA | | Jack | Long | Van | CA | | Jean | Sing | Ma | DE |

Constant Summary

Constants included from Dsl

Dsl::COMMA_SPLITTER_REGEX

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Dsl

#clean_up, #params, #rows, #split_by_comma_regex, #str2utf8, #unstring, #write_to

Constructor Details

#initialize(options = {}) ⇒ Unique

Creates a new Unique :call-seq:

Sycsvpro::Unique.new(infile: "infile.csv",
                     outfile: "outfile.csv",
                     rows:    "1,3-4",
                     cols:    "0,2,4-6",
                     key:     "0,1").execute


42
43
44
45
46
47
48
49
# File 'lib/sycsvpro/unique.rb', line 42

def initialize(options = {})
  @infile  = options[:infile]
  @outfile = options[:outfile]
  @row_filter = RowFilter.new(options[:rows], df: options[:df])
  @col_filter = ColumnFilter.new(options[:cols], df: options[:df])
  @key_filter = ColumnFilter.new(options[:key], df: options[:df])
  @keys = Set.new
end

Instance Attribute Details

#col_filterObject (readonly)

filter that is used for columns



33
34
35
# File 'lib/sycsvpro/unique.rb', line 33

def col_filter
  @col_filter
end

#infileObject (readonly)

infile contains the data that is operated on



27
28
29
# File 'lib/sycsvpro/unique.rb', line 27

def infile
  @infile
end

#outfileObject (readonly)

outfile is the file where the result is written to



29
30
31
# File 'lib/sycsvpro/unique.rb', line 29

def outfile
  @outfile
end

#row_filterObject (readonly)

filter that is used for rows



31
32
33
# File 'lib/sycsvpro/unique.rb', line 31

def row_filter
  @row_filter
end

Instance Method Details

#executeObject

Removes the duplicates from infile and writes the result to outfile



52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# File 'lib/sycsvpro/unique.rb', line 52

def execute
  File.open(@outfile, 'w') do |out|
    File.open(@infile, 'r').each_with_index do |line, index|
      line = line.chomp

      next if line.empty?
      
      line = unstring(line).chomp

      extraction = col_filter.process(row_filter.process(line, row: index))

      next unless extraction

      key = @key_filter.process(line)
      
      unless @keys.include? key
        out.puts extraction
        @keys << key
      end
    end
  end
end