Class: Daru::IO::Importers::HTML

Inherits:
Base
  • Object
show all
Defined in:
lib/daru/io/importers/html.rb

Overview

Note:

Please note that this module works only for static table elements on a HTML page, and won't work in cases where the data is being loaded into the HTML table by inline Javascript.

HTML Importer Class, that extends read_html method to Daru::DataFrame

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Base

from, guess_parse

Methods inherited from Base

#optional_gem

Constructor Details

#initializeHTML

Checks for required gem dependencies of HTML Importer



16
17
18
19
# File 'lib/daru/io/importers/html.rb', line 16

def initialize
  require 'open-uri'
  optional_gem 'nokogiri'
end

Class Method Details

.read(path) ⇒ Daru::IO::Importers::HTML

Reads from a html file / website

Examples:

Reading from a website url file

instance = Daru::IO::Importers::HTML.read('http://www.moneycontrol.com/')

Parameters:

  • path (String)

    Website URL / path to HTML file, where the DataFrame is to be imported from.

Returns:



32
33
34
35
# File 'lib/daru/io/importers/html.rb', line 32

def read(path)
  @file_data = Nokogiri.parse(open(path).read)
  self
end

Instance Method Details

#call(match: nil, order: nil, index: nil, name: nil) ⇒ Array<Daru::DataFrame>

Imports Array of Daru::DataFrames from a HTML Importer instance

Examples:

Importing with matching tables

list_of_dfs = instance.call(match: 'Sun Pharma')
list_of_dfs.count
#=> 4

df = list_of_dfs.first

# As the website keeps changing everyday, the output might not be exactly
# the same as the one obtained below. Nevertheless, a Daru::DataFrame
# should be obtained (as long as 'Sun Pharma' is there on the website).

#=> <Daru::DataFrame(5x4)>
#        Company      Price     Change Value (Rs
#   0 Sun Pharma     502.60     -65.05   2,117.87
#   1   Reliance    1356.90      19.60     745.10
#   2 Tech Mahin     379.45     -49.70     650.22
#   3        ITC     315.85       6.75     621.12
#   4       HDFC    1598.85      50.95     553.91

Parameters:

  • match (String) (defaults to: nil)

    A String to match and choose a particular table(s) from multiple tables of a HTML page.

  • index (Array or Daru::Index or Daru::MultiIndex) (defaults to: nil)

    If given, it overrides the parsed index. Have a look at :index option, at Daru::DataFrame#initialize

  • order (Array or Daru::Index or Daru::MultiIndex) (defaults to: nil)

    If given, it overrides the parsed order. Have a look at :order option here

  • name (String) (defaults to: nil)

    As name of the imported Daru::DataFrame isn't parsed automatically by the module, users can set the name attribute to their Daru::DataFrame manually, through this option.

    See :name option here

Returns:



75
76
77
78
79
80
81
82
83
84
85
86
# File 'lib/daru/io/importers/html.rb', line 75

def call(match: nil, order: nil, index: nil, name: nil)
  @match   = match
  @options = {name: name, index: index, order: order}

  @file_data
    .search('table')
    .map { |table| parse_table(table) }
    .compact
    .keep_if { |table| satisfy_dimension(table) && search(table) }
    .map { |table| decide_values(table, @options) }
    .map { |table| table_to_dataframe(table) }
end

#read(path) ⇒ Daru::IO::Importers::HTML

Reads from a html file / website

Examples:

Reading from a website url file

instance = Daru::IO::Importers::HTML.read('http://www.moneycontrol.com/')

Parameters:

  • path (String)

    Website URL / path to HTML file, where the DataFrame is to be imported from.

Returns:



32
33
34
35
# File 'lib/daru/io/importers/html.rb', line 32

def read(path)
  @file_data = Nokogiri.parse(open(path).read)
  self
end