Class: MineShaft::HTMLTable

Inherits:
Object
  • Object
show all
Defined in:
lib/mine_shaft/html_table.rb

Overview

Provides several convenience methods for translating a (machinist-) parsed HTML table into standard Ruby data structures. All tables are assumed to have a “heading” row as the first row, and that header uses <td> elements (instead of <th>).

Instance Method Summary collapse

Constructor Details

#initialize(parsed_table) ⇒ HTMLTable

Public: Initialize a new HTMLTable with the specified table-data as parse

by machinist (or Nokogiri).

parsed_table - A Nokogiri::HTML::Document or Nokogiri::XML::Element scoped

to only the HTML table you are interested in.  Technically
speaking, you could pass in more content than just the
<table> element and it would likely work fine, but that is
the anticipated content structure.

Returns an instance of HTMLTable



17
18
19
# File 'lib/mine_shaft/html_table.rb', line 17

def initialize(parsed_table)
  @table = parsed_table
end

Instance Method Details

#content_rowsObject

Public: Retrieve the content of all the <td> elements from the table,

except for the first row.

Returns an Array of Array elements, each one being the content from one

row of the table.  The returned content does NOT include the first row,
as it is assumed to be the heading of the table.


27
28
29
30
# File 'lib/mine_shaft/html_table.rb', line 27

def content_rows
  table_content = td_elements[column_count, td_elements.size]
  table_content.enum_slice(column_count).to_a
end

#deserializeObject

Public: Converts HTML table to an Array of Hash objects, using the column

headings as keys for each Hash element.

Examples

Given 'names' was initialized with the following table:

---------------------
|Name  |Number      |
---------------------
|John  |123-456-7890|
---------------------

names.deserialize
# => [{:name => "John", :number => "123-456-7890"}]

Returns an Array of Hash objects. Each Hash element is a

key-value mapping of "table header"-"row content". (Note that the
the key is a downcased-symbol of the heading value).


51
52
53
54
55
56
57
58
59
# File 'lib/mine_shaft/html_table.rb', line 51

def deserialize
  content_rows.map do |row_cells|
    symbolized_headings.inject({}) do |all_attributes, current_attribute|
      index_of_header = symbolized_headings.index(current_attribute)
      value = row_cells[index_of_header]
      all_attributes.merge({current_attribute.to_sym => value})
    end
  end
end

#headingsObject Also known as: headers

Public: Retrieves the content from the <td> elements of the first row of

the table.

Returns an Array of the content contained in each <td> element of the

first row.


73
74
75
# File 'lib/mine_shaft/html_table.rb', line 73

def headings
  td_elements.slice(0,column_count)
end

#symbolized_headingsObject

Public: Converts the return value of #headings to an Array of

lower-cased Symbol elements.

Returns an Array of Symbol elements.



82
83
84
# File 'lib/mine_shaft/html_table.rb', line 82

def symbolized_headings
  headings.map {|header| header.downcase.to_sym}
end

#td_elementsObject

Public: Retrieves the content from all <td> elements in the table.

Returns an Array of the content contained in each <td> element.



64
65
66
# File 'lib/mine_shaft/html_table.rb', line 64

def td_elements
  @table.search("td").map(&:content)
end