Class: Anaximander::Page
- Inherits:
-
Object
- Object
- Anaximander::Page
- Includes:
- Comparable
- Defined in:
- lib/anaximander/page.rb
Overview
Represents a single page of a website being crawled. Exposes the assets and links on the page.
Errors
‘Anaximander::Page` will raise a `PageNotAccessibleError` when the page cannot be fetched for some reason. This is often due to it not existing (404), SSL errors or infinite redirect loops.
Example
page = Page.new("http://example.com")
page.links # => ["http://www.iana.org/domains/example"]
page.assets # => ["/main.css", "/default.js"]
Instance Attribute Summary collapse
-
#children ⇒ Object
Collection of ‘Page` objects that are linked to from the current page.
-
#html ⇒ Object
readonly
Parsed Nokogiri HTML document.
-
#url ⇒ Object
readonly
Absolute url of the page.
Instance Method Summary collapse
- #<=>(other) ⇒ Object
- #assets ⇒ Object
-
#initialize(url) ⇒ Page
constructor
Parameters.
- #inspect ⇒ Object
- #links ⇒ Object
Constructor Details
#initialize(url) ⇒ Page
Parameters
[String] url URL to discover.
OpenURI raises a generic RuntimeError when it cannot fetch a page, for a variety of reasons. Some of which are 404s, SSL errors, or redirect loops.
raises ‘PageNotAccessibleError` when OpenURI fails to fetch the page, for any reason.
50 51 52 53 54 55 |
# File 'lib/anaximander/page.rb', line 50 def initialize(url) @url = url @html = Nokogiri::HTML(open(url)) rescue RuntimeError, OpenURI::HTTPError raise PageNotAccessibleError end |
Instance Attribute Details
#children ⇒ Object
Collection of ‘Page` objects that are linked to from the current page.
37 38 39 |
# File 'lib/anaximander/page.rb', line 37 def children @children end |
#html ⇒ Object (readonly)
Parsed Nokogiri HTML document.
32 33 34 |
# File 'lib/anaximander/page.rb', line 32 def html @html end |
#url ⇒ Object (readonly)
Absolute url of the page.
28 29 30 |
# File 'lib/anaximander/page.rb', line 28 def url @url end |
Instance Method Details
#<=>(other) ⇒ Object
65 66 67 |
# File 'lib/anaximander/page.rb', line 65 def <=>(other) self.url <=> other.url end |
#assets ⇒ Object
61 62 63 |
# File 'lib/anaximander/page.rb', line 61 def assets Discovery::Assets.new(html) end |
#inspect ⇒ Object
69 70 71 |
# File 'lib/anaximander/page.rb', line 69 def inspect %(#<Anaximander::Page:#{object_id} url="#{url}">) end |
#links ⇒ Object
57 58 59 |
# File 'lib/anaximander/page.rb', line 57 def links Discovery::Links.new(html, url) end |