Class: Traject::DelimitedWriter

Inherits:
LineWriter show all
Defined in:
lib/traject/delimited_writer.rb

Overview

A simple line writer that uses configuration to determine how to produce a tab-delimited file

Appropos settings:

  • output_file -- the file to write to
  • output_stream -- the stream to write to, if defined and output_file is not
  • delimited_writer.delimiter -- What to separate fields with; default is tab
  • delimited_writer.internal_delimiter -- Delimiter within a field, for multiple values. Default is pipe ( | )
  • delimited_writer.fields -- comma-separated list of the fields to output
  • delimited_writer.header (true/false) -- boolean that determines if we should output a header row. Default is true
  • delimited_writer.escape -- If a value actually contains the delimited or internal_delimiter, what to do? If unset, will follow the procedure below. If set, will turn it into the character(s) given

If delimited_writer.escape is not set, the writer will automatically escape delimiters/internal_delimiters in the following way:

  • If the delimiter is a tab, replace tabs in values with a single space
  • If the delimiter is anything else, prefix it with a backslash

Direct Known Subclasses

CSVWriter

Instance Attribute Summary collapse

Attributes inherited from LineWriter

#output_file, #settings, #write_mutex

Instance Method Summary collapse

Methods inherited from LineWriter

#close, #open_output_file, #put, #should_close_stream?

Constructor Details

#initialize(settings) ⇒ DelimitedWriter

Returns a new instance of DelimitedWriter.


29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# File 'lib/traject/delimited_writer.rb', line 29

def initialize(settings)
  super

  # fields to output

  begin
    @fields = settings['delimited_writer.fields'].split(",")
  rescue NoMethodError => e
  end

  if e or @fields.empty?
    raise ArgumentError.new("#{self.class.name} must have a comma-delimited list of field names to output set in setting 'delimited_writer.fields'")
  end

  self.delimiter = settings['delimited_writer.delimiter'] || "\t"
  self.internal_delimiter = settings['delimited_writer.internal_delimiter'] || '|'
  self.header = settings['delimited_writer.header'].to_s != 'false'

  # Output the header if need be
  write_header if @header
end

Instance Attribute Details

#delimiterObject

Returns the value of attribute delimiter


26
27
28
# File 'lib/traject/delimited_writer.rb', line 26

def delimiter
  @delimiter
end

#edelimObject (readonly)

Returns the value of attribute edelim


26
27
28
# File 'lib/traject/delimited_writer.rb', line 26

def edelim
  @edelim
end

#eidelimObject (readonly)

Returns the value of attribute eidelim


26
27
28
# File 'lib/traject/delimited_writer.rb', line 26

def eidelim
  @eidelim
end

#headerObject

Returns the value of attribute header


27
28
29
# File 'lib/traject/delimited_writer.rb', line 27

def header
  @header
end

#internal_delimiterObject

Returns the value of attribute internal_delimiter


26
27
28
# File 'lib/traject/delimited_writer.rb', line 26

def internal_delimiter
  @internal_delimiter
end

Instance Method Details

#_write(data) ⇒ Object


74
75
76
# File 'lib/traject/delimited_writer.rb', line 74

def _write(data)
  output_file.puts(data.join(delimiter))
end

#escape(x) ⇒ Object

Escape the delimiters in whatever way has been defined


84
85
86
87
88
89
# File 'lib/traject/delimited_writer.rb', line 84

def escape(x)
  x = x.to_s
  x.gsub! @delimiter, @edelim if @delimiter
  x.gsub! @internal_delimiter, @eidelim
  x
end

#escaped_delimiter(d) ⇒ Object


51
52
53
54
# File 'lib/traject/delimited_writer.rb', line 51

def escaped_delimiter(d)
  return nil if d.nil?
  d == "\t" ? ' ' : '\\' + d
end

#output_values(raw) ⇒ Object

Derive actual output field values from the raw values


93
94
95
96
97
98
99
100
101
102
# File 'lib/traject/delimited_writer.rb', line 93

def output_values(raw)
  raw.map do |x|
    if x.is_a? Array
      x.map!{|s| escape(s)}
      x.join(@internal_delimiter)
    else
      escape(x)
    end
  end
end

#raw_output_values(context) ⇒ Object

Get the output values out of the context


79
80
81
# File 'lib/traject/delimited_writer.rb', line 79

def raw_output_values(context)
  context.output_hash.values_at(*@fields)
end

#serialize(context) ⇒ Object

Spit out the escaped values joined by the delimiter


105
106
107
# File 'lib/traject/delimited_writer.rb', line 105

def serialize(context)
  output_values(raw_output_values(context))
end

#write_headerObject


70
71
72
# File 'lib/traject/delimited_writer.rb', line 70

def write_header
  _write(@fields)
end