Class: Faceoff::Pagelet

Inherits:
Object
  • Object
show all
Defined in:
lib/faceoff/pagelet.rb

Overview

The pagelet class is used to parse out facebook javascript dynamic content on a given html page.

Class Method Summary collapse

Class Method Details

.parse(html, type = nil) ⇒ Object

Parses an html string and returns a hash of Nokogiri::HTML::Document objects, indexed by page area:

Pagelet.parse html
#=> {:profile_photo => <#OBJ>, :top_bar => <#OBJ>...}

Pagelet.parse html, :profile_photo
#=> <#OBJ>


18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# File 'lib/faceoff/pagelet.rb', line 18

def self.parse html, type=nil
  pagelet = nil

  matches = html.scan regex_for(type)

  matches.each do |name, html|
    html     = JSON.parse("[\"#{html}\"]").first
    html_doc = Nokogiri::HTML.parse html
    return html_doc if type

    pagelet ||= {}
    pagelet[name.to_sym] = html_doc
  end

  pagelet
end

.regex_for(name) ⇒ Object

Returns a regex to retrieve the given pagelet.



39
40
41
42
# File 'lib/faceoff/pagelet.rb', line 39

def self.regex_for name
  name ||= "\\w+"
  %r{<script>.*"pagelet_(#{name})":"(.*)"\},"page_cache":.*\}\);</script>}
end