Class: IOStreams::Tabular::Header

Inherits:
Object
  • Object
show all
Defined in:
lib/io_streams/tabular/header.rb

Overview

Process files / streams that start with a header.

Constant Summary collapse

IGNORE_PREFIX =

Column names that begin with this prefix have been rejected and should be ignored.

"__rejected__".freeze

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(columns: nil, allowed_columns: nil, required_columns: nil, skip_unknown: true) ⇒ Header

Header

Parameters

columns [Array<String>]
  Columns in this header.
  Note:
    It is recommended to keep all columns as strings to avoid any issues when persistence
    with MongoDB when it converts symbol keys to strings.

allowed_columns [Array<String>]
  List of columns to allow.
  Default: nil ( Allow all columns )
  Note:
  * So that rejected columns can be identified in subsequent steps, they will be prefixed with `__rejected__`.
    For example, `Unknown Column` would be cleansed as `__rejected__Unknown Column`.

required_columns [Array<String>]
  List of columns that must be present, otherwise an Exception is raised.

skip_unknown [true|false]
  true:
    Skip columns not present in the whitelist by cleansing them to nil.
    #as_hash will skip these additional columns entirely as if they were not in the file at all.
  false:
    Raises Tabular::InvalidHeader when a column is supplied that is not in the whitelist.


35
36
37
38
39
40
# File 'lib/io_streams/tabular/header.rb', line 35

def initialize(columns: nil, allowed_columns: nil, required_columns: nil, skip_unknown: true)
  @columns          = columns
  @required_columns = required_columns
  @allowed_columns  = allowed_columns
  @skip_unknown     = skip_unknown
end

Instance Attribute Details

#allowed_columnsObject

Returns the value of attribute allowed_columns.



8
9
10
# File 'lib/io_streams/tabular/header.rb', line 8

def allowed_columns
  @allowed_columns
end

#columnsObject

Returns the value of attribute columns.



8
9
10
# File 'lib/io_streams/tabular/header.rb', line 8

def columns
  @columns
end

#required_columnsObject

Returns the value of attribute required_columns.



8
9
10
# File 'lib/io_streams/tabular/header.rb', line 8

def required_columns
  @required_columns
end

#skip_unknownObject

Returns the value of attribute skip_unknown.



8
9
10
# File 'lib/io_streams/tabular/header.rb', line 8

def skip_unknown
  @skip_unknown
end

Instance Method Details

#cleanse!Object

Returns [Array<String>] list columns that were ignored during cleansing.

Each column is cleansed as follows:

  • Leading and trailing whitespace is stripped.

  • All characters converted to lower case.

  • Spaces and ‘-’ are converted to ‘_’.

  • All characters except for letters, digits, and ‘_’ are stripped.

Notes:

  • So that rejected columns can be identified in subsequent steps, they will be prefixed with ‘__rejected__`. For example, `Unknown Column` would be cleansed as `__rejected__Unknown Column`.

  • Raises Tabular::InvalidHeader when there are no rejected columns left after cleansing.



54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# File 'lib/io_streams/tabular/header.rb', line 54

def cleanse!
  return [] if columns.nil? || columns.empty?

  ignored_columns = []
  self.columns    = columns.collect do |column|
    cleansed = cleanse_column(column)
    if allowed_columns.nil? || allowed_columns.include?(cleansed)
      cleansed
    else
      ignored_columns << column
      "#{IGNORE_PREFIX}#{column}"
    end
  end

  if !skip_unknown && !ignored_columns.empty?
    raise(IOStreams::Errors::InvalidHeader, "Unknown columns after cleansing: #{ignored_columns.join(',')}")
  end

  if ignored_columns.size == columns.size
    raise(IOStreams::Errors::InvalidHeader, "All columns are unknown after cleansing: #{ignored_columns.join(',')}")
  end

  if required_columns
    missing_columns = required_columns - columns
    unless missing_columns.empty?
      raise(IOStreams::Errors::InvalidHeader, "Missing columns after cleansing: #{missing_columns.join(',')}")
    end
  end

  ignored_columns
end

#to_array(row, cleanse = true) ⇒ Object



110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
# File 'lib/io_streams/tabular/header.rb', line 110

def to_array(row, cleanse = true)
  if row.is_a?(Hash) && columns
    row = cleanse_hash(row) if cleanse
    row = columns.collect { |column| row[column] }
  end

  unless row.is_a?(Array)
    raise(
      IOStreams::Errors::TypeMismatch,
      "Don't know how to convert #{row.class.name} to an Array without the header columns being set."
    )
  end

  row
end

#to_hash(row, cleanse = true) ⇒ Object

Marshal to Hash from Array or Hash by applying this header

Parameters:

cleanse [true|false]
  Whether to cleanse and narrow the supplied hash to just those columns in this header.
  Only Applies to when the hash is already a Hash.
  Useful to turn off narrowing when the input data is already trusted.


93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
# File 'lib/io_streams/tabular/header.rb', line 93

def to_hash(row, cleanse = true)
  return if IOStreams::Utils.blank?(row)

  case row
  when Array
    unless columns
      raise(IOStreams::Errors::InvalidHeader, "Missing mandatory header when trying to convert a row into a hash")
    end

    array_to_hash(row)
  when Hash
    cleanse && columns ? cleanse_hash(row) : row
  else
    raise(IOStreams::Errors::TypeMismatch, "Don't know how to convert #{row.class.name} to a Hash")
  end
end