Method: DhEasy::Text.parse_vertical_table

Defined in:
lib/dh_easy/text.rb

.parse_vertical_table(opts = {}) {|data, row, header_map| ... } ⇒ Hash{Symbol => Array,Hash,nil}

Parse data from a vertical table like structure matching a selectors and

using a header map to match columns.

Parameters:

  • opts (Hash) (defaults to: {})

    ({}) Configuration options.

Options Hash (opts):

  • :html (Nokogiri::Element)

    Container element to search into.

  • :row_selector (String)

    Vertical row like elements selector.

  • :header_selector (String)

    Header column elements selector.

  • :header_key_label_map (Hash{Symbol,String => Regex,String})

    Header key vs. label dictionary to match column indexes.

  • :content_selector (String)

    Content row elements selector.

  • :column_parsers (Hash{Symbol,String => lambda,proc}) — default: {}

    Custom column parsers for advance data extraction.

  • :ignore_text_nodes (Boolean) — default: true

    Ignore text nodes when retriving cells and rows.

Yield Parameters:

  • data (Hash{Symbol,String => Object})

    Parsed content row data.

  • row (Array)

    Raw content row data.

  • header_map (Hash{Symbol,String => Integer})

    Header map used.

Yield Returns:

  • (Boolean)

    ‘true` when valid, else `false`.

Returns:

  • (Hash{Symbol => Array,Hash,nil})

    Hash data is as follows:

    • ‘[Hash] :header_map` Header map used.

    • ‘[Array<Hash>,nil] :data` Parsed rows data.



276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
# File 'lib/dh_easy/text.rb', line 276

def self.parse_vertical_table opts = {}, &filter
  opts = {
    html: nil,
    row_selector: nil,
    header_selector: nil,
    header_key_label_map: {},
    content_selector: nil,
    column_parsers: {},
    ignore_text_nodes: true
  }.merge opts
  return nil if opts[:html].nil?

  # Setup config
  data = {}
  dictionary = opts[:header_key_label_map]
  column_parsers = opts[:column_parsers]

  # Extract headers and content
  html_rows = opts[:html].css(opts[:row_selector]) rescue nil
  return nil if html_rows.nil?
  html_rows.each do |row|
    # Parse and map column header
    header_element = row.css(opts[:header_selector])
    key = translate_label_to_key header_element, dictionary
    next if key.nil? || key == ''

    # Parse column html with default or custom parser
    content_element = row.css(opts[:content_selector])
    column_parsers[key].nil? ?
      default_parser(content_element, data, key) :
      column_parsers[key].call(content_element, data, key)
  end
  data
end