Class: IOStreams::Tabular::Header

Inherits:
Object
  • Object
show all
Defined in:
lib/io_streams/tabular/header.rb

Overview

Process files / streams that start with a header.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(columns: nil, allowed_columns: nil, required_columns: nil, skip_unknown: true) ⇒ Header

Header

Parameters

columns [Array<String>]
  Columns in this header.
  Note:
    It is recommended to keep all columns as strings to avoid any issues when persistence
    with MongoDB when it converts symbol keys to strings.

allowed_columns [Array<String>]
  List of columns to allow.
  Default: nil ( Allow all columns )
  Note:
    When supplied any columns that are rejected will be returned in the cleansed columns
    as nil so that they can be ignored during processing.

required_columns [Array<String>]
  List of columns that must be present, otherwise an Exception is raised.

skip_unknown [true|false]
  true:
    Skip columns not present in the whitelist by cleansing them to nil.
    #as_hash will skip these additional columns entirely as if they were not in the file at all.
  false:
    Raises Tabular::InvalidHeader when a column is supplied that is not in the whitelist.


32
33
34
35
36
37
# File 'lib/io_streams/tabular/header.rb', line 32

def initialize(columns: nil, allowed_columns: nil, required_columns: nil, skip_unknown: true)
  @columns          = columns
  @required_columns = required_columns
  @allowed_columns  = allowed_columns
  @skip_unknown     = skip_unknown
end

Instance Attribute Details

#allowed_columnsObject

Returns the value of attribute allowed_columns.



5
6
7
# File 'lib/io_streams/tabular/header.rb', line 5

def allowed_columns
  @allowed_columns
end

#columnsObject

Returns the value of attribute columns.



5
6
7
# File 'lib/io_streams/tabular/header.rb', line 5

def columns
  @columns
end

#required_columnsObject

Returns the value of attribute required_columns.



5
6
7
# File 'lib/io_streams/tabular/header.rb', line 5

def required_columns
  @required_columns
end

#skip_unknownObject

Returns the value of attribute skip_unknown.



5
6
7
# File 'lib/io_streams/tabular/header.rb', line 5

def skip_unknown
  @skip_unknown
end

Instance Method Details

#cleanse!Object

Returns [Array<String>] list columns that were ignored during cleansing.

Each column is cleansed as follows:

  • Leading and trailing whitespace is stripped.

  • All characters converted to lower case.

  • Spaces and ‘-’ are converted to ‘_’.

  • All characters except for letters, digits, and ‘_’ are stripped.

Notes

  • Raises Tabular::InvalidHeader when there are no non-nil columns left after cleansing.



49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
# File 'lib/io_streams/tabular/header.rb', line 49

def cleanse!
  return [] if columns.nil? || columns.empty?

  ignored_columns = []
  self.columns    = columns.collect do |column|
    cleansed = cleanse_column(column)
    if allowed_columns.nil? || allowed_columns.include?(cleansed)
      cleansed
    else
      ignored_columns << column
      nil
    end
  end

  if !skip_unknown && !ignored_columns.empty?
    raise(IOStreams::Errors::InvalidHeader, "Unknown columns after cleansing: #{ignored_columns.join(',')}")
  end

  if ignored_columns.size == columns.size
    raise(IOStreams::Errors::InvalidHeader, "All columns are unknown after cleansing: #{ignored_columns.join(',')}")
  end

  if required_columns
    missing_columns = required_columns - columns
    unless missing_columns.empty?
      raise(IOStreams::Errors::InvalidHeader, "Missing columns after cleansing: #{missing_columns.join(',')}")
    end
  end

  ignored_columns
end

#to_array(row, cleanse = true) ⇒ Object



105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# File 'lib/io_streams/tabular/header.rb', line 105

def to_array(row, cleanse = true)
  if row.is_a?(Hash) && columns
    row = cleanse_hash(row) if cleanse
    row = columns.collect { |column| row[column] }
  end

  unless row.is_a?(Array)
    raise(
      IOStreams::Errors::TypeMismatch,
      "Don't know how to convert #{row.class.name} to an Array without the header columns being set."
    )
  end

  row
end

#to_hash(row, cleanse = true) ⇒ Object

Marshal to Hash from Array or Hash by applying this header

Parameters:

cleanse [true|false]
  Whether to cleanse and narrow the supplied hash to just those columns in this header.
  Only Applies to when the hash is already a Hash.
  Useful to turn off narrowing when the input data is already trusted.


88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
# File 'lib/io_streams/tabular/header.rb', line 88

def to_hash(row, cleanse = true)
  return if IOStreams::Utils.blank?(row)

  case row
  when Array
    unless columns
      raise(IOStreams::Errors::InvalidHeader, "Missing mandatory header when trying to convert a row into a hash")
    end

    array_to_hash(row)
  when Hash
    cleanse && columns ? cleanse_hash(row) : row
  else
    raise(IOStreams::Errors::TypeMismatch, "Don't know how to convert #{row.class.name} to a Hash")
  end
end