Class: IOStreams::Tabular::Header
- Inherits:
-
Object
- Object
- IOStreams::Tabular::Header
- Defined in:
- lib/io_streams/tabular/header.rb
Overview
Process files / streams that start with a header.
Constant Summary collapse
- IGNORE_PREFIX =
Column names that begin with this prefix have been rejected and should be ignored.
"__rejected__".freeze
Instance Attribute Summary collapse
-
#allowed_columns ⇒ Object
Returns the value of attribute allowed_columns.
-
#columns ⇒ Object
Returns the value of attribute columns.
-
#required_columns ⇒ Object
Returns the value of attribute required_columns.
-
#skip_unknown ⇒ Object
Returns the value of attribute skip_unknown.
Instance Method Summary collapse
-
#cleanse! ⇒ Object
Returns [Array<String>] list columns that were ignored during cleansing.
-
#initialize(columns: nil, allowed_columns: nil, required_columns: nil, skip_unknown: true) ⇒ Header
constructor
Header.
- #to_array(row, cleanse = true) ⇒ Object
-
#to_hash(row, cleanse = true) ⇒ Object
Marshal to Hash from Array or Hash by applying this header.
Constructor Details
#initialize(columns: nil, allowed_columns: nil, required_columns: nil, skip_unknown: true) ⇒ Header
Header
Parameters
columns [Array<String>]
Columns in this header.
Note:
It is recommended to keep all columns as strings to avoid any issues when persistence
with MongoDB when it converts symbol keys to strings.
allowed_columns [Array<String>]
List of columns to allow.
Default: nil ( Allow all columns )
Note:
* So that rejected columns can be identified in subsequent steps, they will be prefixed with `__rejected__`.
For example, `Unknown Column` would be cleansed as `__rejected__Unknown Column`.
required_columns [Array<String>]
List of columns that must be present, otherwise an Exception is raised.
skip_unknown [true|false]
true:
Skip columns not present in the whitelist by cleansing them to nil.
#as_hash will skip these additional columns entirely as if they were not in the file at all.
false:
Raises Tabular::InvalidHeader when a column is supplied that is not in the whitelist.
35 36 37 38 39 40 |
# File 'lib/io_streams/tabular/header.rb', line 35 def initialize(columns: nil, allowed_columns: nil, required_columns: nil, skip_unknown: true) @columns = columns @required_columns = required_columns @allowed_columns = allowed_columns @skip_unknown = skip_unknown end |
Instance Attribute Details
#allowed_columns ⇒ Object
Returns the value of attribute allowed_columns.
8 9 10 |
# File 'lib/io_streams/tabular/header.rb', line 8 def allowed_columns @allowed_columns end |
#columns ⇒ Object
Returns the value of attribute columns.
8 9 10 |
# File 'lib/io_streams/tabular/header.rb', line 8 def columns @columns end |
#required_columns ⇒ Object
Returns the value of attribute required_columns.
8 9 10 |
# File 'lib/io_streams/tabular/header.rb', line 8 def required_columns @required_columns end |
#skip_unknown ⇒ Object
Returns the value of attribute skip_unknown.
8 9 10 |
# File 'lib/io_streams/tabular/header.rb', line 8 def skip_unknown @skip_unknown end |
Instance Method Details
#cleanse! ⇒ Object
Returns [Array<String>] list columns that were ignored during cleansing.
Each column is cleansed as follows:
-
Leading and trailing whitespace is stripped.
-
All characters converted to lower case.
-
Spaces and ‘-’ are converted to ‘_’.
-
All characters except for letters, digits, and ‘_’ are stripped.
Notes:
-
So that rejected columns can be identified in subsequent steps, they will be prefixed with ‘__rejected__`. For example, `Unknown Column` would be cleansed as `__rejected__Unknown Column`.
-
Raises Tabular::InvalidHeader when there are no rejected columns left after cleansing.
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
# File 'lib/io_streams/tabular/header.rb', line 54 def cleanse! return [] if columns.nil? || columns.empty? ignored_columns = [] self.columns = columns.collect do |column| cleansed = cleanse_column(column) if allowed_columns.nil? || allowed_columns.include?(cleansed) cleansed else ignored_columns << column "#{IGNORE_PREFIX}#{column}" end end if !skip_unknown && !ignored_columns.empty? raise(IOStreams::Errors::InvalidHeader, "Unknown columns after cleansing: #{ignored_columns.join(',')}") end if ignored_columns.size == columns.size raise(IOStreams::Errors::InvalidHeader, "All columns are unknown after cleansing: #{ignored_columns.join(',')}") end if required_columns missing_columns = required_columns - columns unless missing_columns.empty? raise(IOStreams::Errors::InvalidHeader, "Missing columns after cleansing: #{missing_columns.join(',')}") end end ignored_columns end |
#to_array(row, cleanse = true) ⇒ Object
110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
# File 'lib/io_streams/tabular/header.rb', line 110 def to_array(row, cleanse = true) if row.is_a?(Hash) && columns row = cleanse_hash(row) if cleanse row = columns.collect { |column| row[column] } end unless row.is_a?(Array) raise( IOStreams::Errors::TypeMismatch, "Don't know how to convert #{row.class.name} to an Array without the header columns being set." ) end row end |
#to_hash(row, cleanse = true) ⇒ Object
Marshal to Hash from Array or Hash by applying this header
Parameters:
cleanse [true|false]
Whether to cleanse and narrow the supplied hash to just those columns in this header.
Only Applies to when the hash is already a Hash.
Useful to turn off narrowing when the input data is already trusted.
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
# File 'lib/io_streams/tabular/header.rb', line 93 def to_hash(row, cleanse = true) return if IOStreams::Utils.blank?(row) case row when Array unless columns raise(IOStreams::Errors::InvalidHeader, "Missing mandatory header when trying to convert a row into a hash") end array_to_hash(row) when Hash cleanse && columns ? cleanse_hash(row) : row else raise(IOStreams::Errors::TypeMismatch, "Don't know how to convert #{row.class.name} to a Hash") end end |