Module: WorkerTools::XlsxInput
- Defined in:
- lib/worker_tools/xlsx_input.rb
Defined Under Namespace
Classes: XlsxInputForeach
Instance Method Summary collapse
-
#xlsx_input_columns ⇒ Object
If an array is provided, the names will be used as the row keys, the row values will be assign according to the columns order.
- #xlsx_input_columns_array_check(xlsx_rows_enum) ⇒ Object
- #xlsx_input_columns_check(xlsx_rows_enum) ⇒ Object
- #xlsx_input_columns_hash_check(xlsx_rows_enum) ⇒ Object
- #xlsx_input_columns_hash_check_duplicates(names) ⇒ Object
- #xlsx_input_columns_hash_check_missing(actual_names, expected_names) ⇒ Object
- #xlsx_input_file_path ⇒ Object
- #xlsx_input_foreach ⇒ Object
- #xlsx_input_header_normalized(name) ⇒ Object
-
#xlsx_input_include_other_columns ⇒ Object
If true, the rows will append those columns that don’t belong to the xlsx_input_columns list.
-
#xlsx_input_mapping_order(header_names) ⇒ Object
Compares the first row (header names) with the xlsx_input_columns hash to find the corresponding positions.
- #xlsx_input_mapping_order_for_hash(header_names) ⇒ Object
- #xlsx_input_mapping_order_with_other_columns(mapping, filtered_column_names) ⇒ Object
-
#xlsx_input_value_cleanup(value) ⇒ Object
Allows for some basic cleanup of the values, such as applying strip to the strings.
- #xlsx_rows_enum ⇒ Object
Instance Method Details
#xlsx_input_columns ⇒ Object
If an array is provided, the names will be used as the row keys, the row values will be assign according to the columns order.
Ex: %w(tenant segment area) row =>
tenant: _value_at_first_column_,
segment: _value_at_second_column_,
area: _value_at_third_column_
If a hash if provided, the keys will turn into the row keys, the values will be used to find the corresponding columns (the order in the spreadsheet won’t affect the import)
Ex: { tenant: ‘Mandant’, segment: ‘Segment’, area: ‘Bereich’) row =>
tenant: _value_at_column_Mandant,
segment: _value_at_column_Segment,
area: _value_at_column_Bereich
The name of the column is filtered using the xlsx_input_header_normalized method, which takes care of extra spaces and looks for a case insentive match (so ‘Bereich’ matches ‘ Bereich’, ‘bereich’, etc.). You can override that method as well.
Besides matching the columns using strings, it is possible to use a regular expression or a proc: {
tenant: 'Mandant',
segment: /Segment/i,
area: ->(name) { name.downcase == 'area' }
}
38 39 40 |
# File 'lib/worker_tools/xlsx_input.rb', line 38 def xlsx_input_columns raise "xlsx_input_columns has to be defined in #{self}" end |
#xlsx_input_columns_array_check(xlsx_rows_enum) ⇒ Object
66 67 68 69 70 71 72 |
# File 'lib/worker_tools/xlsx_input.rb', line 66 def xlsx_input_columns_array_check(xlsx_rows_enum) expected_columns_length = xlsx_input_columns.length actual_columns_length = xlsx_rows_enum.first.length return if expected_columns_length == actual_columns_length raise "The number of columns (#{actual_columns_length}) is not the expected (#{expected_columns_length})" end |
#xlsx_input_columns_check(xlsx_rows_enum) ⇒ Object
59 60 61 62 63 64 |
# File 'lib/worker_tools/xlsx_input.rb', line 59 def xlsx_input_columns_check(xlsx_rows_enum) # override and return true if you do not want this check to be performed return xlsx_input_columns_array_check(xlsx_rows_enum) if xlsx_input_columns.is_a?(Array) xlsx_input_columns_hash_check(xlsx_rows_enum) end |
#xlsx_input_columns_hash_check(xlsx_rows_enum) ⇒ Object
74 75 76 77 78 79 |
# File 'lib/worker_tools/xlsx_input.rb', line 74 def xlsx_input_columns_hash_check(xlsx_rows_enum) expected_names = xlsx_input_columns.values filtered_actual_names = xlsx_rows_enum.first.map { |n| xlsx_input_header_normalized(n) } xlsx_input_columns_hash_check_duplicates(filtered_actual_names) xlsx_input_columns_hash_check_missing(filtered_actual_names, expected_names) end |
#xlsx_input_columns_hash_check_duplicates(names) ⇒ Object
81 82 83 84 |
# File 'lib/worker_tools/xlsx_input.rb', line 81 def xlsx_input_columns_hash_check_duplicates(names) dups = names.group_by(&:itself).select { |_, v| v.count > 1 }.keys raise "The file contains duplicated columns: #{dups}" if dups.present? end |
#xlsx_input_columns_hash_check_missing(actual_names, expected_names) ⇒ Object
86 87 88 89 90 91 92 |
# File 'lib/worker_tools/xlsx_input.rb', line 86 def xlsx_input_columns_hash_check_missing(actual_names, expected_names) missing = expected_names.reject do |name| matchable = name.is_a?(String) ? xlsx_input_header_normalized(name) : name actual_names.any? { |n| case n when matchable then true end } # rubocop does not like === end raise "Some columns are missing: #{missing}" unless missing.empty? end |
#xlsx_input_file_path ⇒ Object
125 126 127 |
# File 'lib/worker_tools/xlsx_input.rb', line 125 def xlsx_input_file_path model..path.to_s end |
#xlsx_input_foreach ⇒ Object
136 137 138 139 140 141 142 143 144 145 146 |
# File 'lib/worker_tools/xlsx_input.rb', line 136 def xlsx_input_foreach @xlsx_input_foreach ||= begin xlsx_input_columns_check(xlsx_rows_enum) XlsxInputForeach.new( rows_enum: xlsx_rows_enum, mapping_order: xlsx_input_mapping_order(xlsx_rows_enum.first), cleanup_method: method(:xlsx_input_value_cleanup) ) end end |
#xlsx_input_header_normalized(name) ⇒ Object
49 50 51 |
# File 'lib/worker_tools/xlsx_input.rb', line 49 def xlsx_input_header_normalized(name) name.to_s.strip.downcase end |
#xlsx_input_include_other_columns ⇒ Object
If true, the rows will append those columns that don’t belong to the xlsx_input_columns list. Useful when the spreadsheet contains some fixed columns and a number of variable ones.
45 46 47 |
# File 'lib/worker_tools/xlsx_input.rb', line 45 def xlsx_input_include_other_columns false end |
#xlsx_input_mapping_order(header_names) ⇒ Object
Compares the first row (header names) with the xlsx_input_columns hash to find the corresponding positions.
Ex: xlsx_input_columns: ‘Mandant’, area: ‘Bereich’
headers: ['Bereich', 'Mandant']
=> { tenant: 1, area: 0}
100 101 102 103 104 |
# File 'lib/worker_tools/xlsx_input.rb', line 100 def xlsx_input_mapping_order(header_names) return xlsx_input_columns.map.with_index { |n, i| [n, i] }.to_h if xlsx_input_columns.is_a?(Array) xlsx_input_mapping_order_for_hash(header_names) end |
#xlsx_input_mapping_order_for_hash(header_names) ⇒ Object
106 107 108 109 110 111 112 113 114 115 |
# File 'lib/worker_tools/xlsx_input.rb', line 106 def xlsx_input_mapping_order_for_hash(header_names) filtered_column_names = header_names.map { |n| xlsx_input_header_normalized(n) } mapping = xlsx_input_columns.each_with_object({}) do |(k, v), h| matchable = v.is_a?(String) ? xlsx_input_header_normalized(v) : v h[k] = filtered_column_names.index { |n| case n when matchable then true end } end return mapping unless xlsx_input_include_other_columns xlsx_input_mapping_order_with_other_columns(mapping, filtered_column_names) end |
#xlsx_input_mapping_order_with_other_columns(mapping, filtered_column_names) ⇒ Object
117 118 119 120 121 122 123 |
# File 'lib/worker_tools/xlsx_input.rb', line 117 def xlsx_input_mapping_order_with_other_columns(mapping, filtered_column_names) positions_taken = mapping.values filtered_column_names.each_with_index do |header, index| mapping[header] = index unless positions_taken.include?(index) end mapping end |
#xlsx_input_value_cleanup(value) ⇒ Object
Allows for some basic cleanup of the values, such as applying strip to the strings.
55 56 57 |
# File 'lib/worker_tools/xlsx_input.rb', line 55 def xlsx_input_value_cleanup(value) value.is_a?(String) ? value.strip : value end |
#xlsx_rows_enum ⇒ Object
129 130 131 132 133 134 |
# File 'lib/worker_tools/xlsx_input.rb', line 129 def xlsx_rows_enum @xlsx_rows_enum ||= begin spreadsheet = Roo::Excelx.new(xlsx_input_file_path) spreadsheet.each_row_streaming(sheet: spreadsheet.sheets.first, pad_cells: true) end end |