Class: Anaximander::Page

Inherits:
Object
  • Object
show all
Includes:
Comparable
Defined in:
lib/anaximander/page.rb

Overview

Represents a single page of a website being crawled. Exposes the assets and links on the page.

Errors

‘Anaximander::Page` will raise a `PageNotAccessibleError` when the page cannot be fetched for some reason. This is often due to it not existing (404), SSL errors or infinite redirect loops.

Example

page = Page.new("http://example.com")
page.links  # => ["http://www.iana.org/domains/example"]
page.assets # => ["/main.css", "/default.js"]

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(url) ⇒ Page

Parameters

[String] url URL to discover.

OpenURI raises a generic RuntimeError when it cannot fetch a page, for a variety of reasons. Some of which are 404s, SSL errors, or redirect loops.

raises ‘PageNotAccessibleError` when OpenURI fails to fetch the page, for any reason.



50
51
52
53
54
55
# File 'lib/anaximander/page.rb', line 50

def initialize(url)
  @url  = url
  @html = Nokogiri::HTML(open(url))
rescue RuntimeError, OpenURI::HTTPError
  raise PageNotAccessibleError
end

Instance Attribute Details

#childrenObject

Collection of ‘Page` objects that are linked to from the current page.



37
38
39
# File 'lib/anaximander/page.rb', line 37

def children
  @children
end

#htmlObject (readonly)

Parsed Nokogiri HTML document.



32
33
34
# File 'lib/anaximander/page.rb', line 32

def html
  @html
end

#urlObject (readonly)

Absolute url of the page.



28
29
30
# File 'lib/anaximander/page.rb', line 28

def url
  @url
end

Instance Method Details

#<=>(other) ⇒ Object



65
66
67
# File 'lib/anaximander/page.rb', line 65

def <=>(other)
  self.url <=> other.url
end

#assetsObject



61
62
63
# File 'lib/anaximander/page.rb', line 61

def assets
  Discovery::Assets.new(html)
end

#inspectObject



69
70
71
# File 'lib/anaximander/page.rb', line 69

def inspect
  %(#<Anaximander::Page:#{object_id} url="#{url}">)
end


57
58
59
# File 'lib/anaximander/page.rb', line 57

def links
  Discovery::Links.new(html, url)
end