Class: IOStreams::Tabular::Header
- Inherits:
-
Object
- Object
- IOStreams::Tabular::Header
- Defined in:
- lib/io_streams/tabular/header.rb
Overview
Process files / streams that start with a header.
Instance Attribute Summary collapse
-
#allowed_columns ⇒ Object
Returns the value of attribute allowed_columns.
-
#columns ⇒ Object
Returns the value of attribute columns.
-
#required_columns ⇒ Object
Returns the value of attribute required_columns.
-
#skip_unknown ⇒ Object
Returns the value of attribute skip_unknown.
Instance Method Summary collapse
-
#cleanse! ⇒ Object
Returns [Array<String>] list columns that were ignored during cleansing.
-
#initialize(columns: nil, allowed_columns: nil, required_columns: nil, skip_unknown: true) ⇒ Header
constructor
Header.
- #to_array(row, cleanse = true) ⇒ Object
-
#to_hash(row, cleanse = true) ⇒ Object
Marshal to Hash from Array or Hash by applying this header.
Constructor Details
#initialize(columns: nil, allowed_columns: nil, required_columns: nil, skip_unknown: true) ⇒ Header
Header
Parameters
columns [Array<String>]
Columns in this header.
Note:
It is recommended to keep all columns as strings to avoid any issues when persistence
with MongoDB when it converts symbol keys to strings.
allowed_columns [Array<String>]
List of columns to allow.
Default: nil ( Allow all columns )
Note:
When supplied any columns that are rejected will be returned in the cleansed columns
as nil so that they can be ignored during processing.
required_columns [Array<String>]
List of columns that must be present, otherwise an Exception is raised.
skip_unknown [true|false]
true:
Skip columns not present in the whitelist by cleansing them to nil.
#as_hash will skip these additional columns entirely as if they were not in the file at all.
false:
Raises Tabular::InvalidHeader when a column is supplied that is not in the whitelist.
32 33 34 35 36 37 |
# File 'lib/io_streams/tabular/header.rb', line 32 def initialize(columns: nil, allowed_columns: nil, required_columns: nil, skip_unknown: true) @columns = columns @required_columns = required_columns @allowed_columns = allowed_columns @skip_unknown = skip_unknown end |
Instance Attribute Details
#allowed_columns ⇒ Object
Returns the value of attribute allowed_columns.
5 6 7 |
# File 'lib/io_streams/tabular/header.rb', line 5 def allowed_columns @allowed_columns end |
#columns ⇒ Object
Returns the value of attribute columns.
5 6 7 |
# File 'lib/io_streams/tabular/header.rb', line 5 def columns @columns end |
#required_columns ⇒ Object
Returns the value of attribute required_columns.
5 6 7 |
# File 'lib/io_streams/tabular/header.rb', line 5 def required_columns @required_columns end |
#skip_unknown ⇒ Object
Returns the value of attribute skip_unknown.
5 6 7 |
# File 'lib/io_streams/tabular/header.rb', line 5 def skip_unknown @skip_unknown end |
Instance Method Details
#cleanse! ⇒ Object
Returns [Array<String>] list columns that were ignored during cleansing.
Each column is cleansed as follows:
-
Leading and trailing whitespace is stripped.
-
All characters converted to lower case.
-
Spaces and ‘-’ are converted to ‘_’.
-
All characters except for letters, digits, and ‘_’ are stripped.
Notes
-
Raises Tabular::InvalidHeader when there are no non-nil columns left after cleansing.
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
# File 'lib/io_streams/tabular/header.rb', line 49 def cleanse! return [] if columns.nil? || columns.empty? ignored_columns = [] self.columns = columns.collect do |column| cleansed = cleanse_column(column) if allowed_columns.nil? || allowed_columns.include?(cleansed) cleansed else ignored_columns << column nil end end if !skip_unknown && !ignored_columns.empty? raise(IOStreams::Errors::InvalidHeader, "Unknown columns after cleansing: #{ignored_columns.join(',')}") end if ignored_columns.size == columns.size raise(IOStreams::Errors::InvalidHeader, "All columns are unknown after cleansing: #{ignored_columns.join(',')}") end if required_columns missing_columns = required_columns - columns unless missing_columns.empty? raise(IOStreams::Errors::InvalidHeader, "Missing columns after cleansing: #{missing_columns.join(',')}") end end ignored_columns end |
#to_array(row, cleanse = true) ⇒ Object
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
# File 'lib/io_streams/tabular/header.rb', line 105 def to_array(row, cleanse = true) if row.is_a?(Hash) && columns row = cleanse_hash(row) if cleanse row = columns.collect { |column| row[column] } end unless row.is_a?(Array) raise( IOStreams::Errors::TypeMismatch, "Don't know how to convert #{row.class.name} to an Array without the header columns being set." ) end row end |
#to_hash(row, cleanse = true) ⇒ Object
Marshal to Hash from Array or Hash by applying this header
Parameters:
cleanse [true|false]
Whether to cleanse and narrow the supplied hash to just those columns in this header.
Only Applies to when the hash is already a Hash.
Useful to turn off narrowing when the input data is already trusted.
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
# File 'lib/io_streams/tabular/header.rb', line 88 def to_hash(row, cleanse = true) return if IOStreams::Utils.blank?(row) case row when Array unless columns raise(IOStreams::Errors::InvalidHeader, "Missing mandatory header when trying to convert a row into a hash") end array_to_hash(row) when Hash cleanse && columns ? cleanse_hash(row) : row else raise(IOStreams::Errors::TypeMismatch, "Don't know how to convert #{row.class.name} to a Hash") end end |