Class: HtmlPageTitle

Inherits:

Object

Object
HtmlPageTitle

show all

Defined in:: lib/html_page_title.rb

Overview

A simple class for finding the title of a given http url by fetching the given url, following all eventual redirects and finally parsing it through hpricot.

You can either use the shorthand form or initialize the instance properly:

* HtmlPageTitle('http://github.com')
* HtmlPageTitle.new('http://github.com')

Those calls are equivalent, except for one subtle difference: The shorthand form will swallow SocketErrors and return nil (i.e. this will happen for invalid urls), while the regular instantiation via new will throw that error.

You can either get the title, the heading (which will be the content of the first h1 tag in the body) or the label, which will be (in the following order by availability) either the heading, or the title, or the target url after redirecting. Note that if the title or the heading can not be found (e.g. a non-HTML document), both methods will return nil, so the label method is the only one that will always return some kind of string

Instance Attribute Summary collapse

#original_url ⇒ Object readonly

Returns the value of attribute original_url.

Instance Method Summary collapse

#body ⇒ Object

Returns the body of the document at the (redirected?) target.
#document ⇒ Object
#heading ⇒ Object

Retrieves the first h1 tag in the page and returns it’s content.
#initialize(original_url) ⇒ HtmlPageTitle constructor

A new instance of HtmlPageTitle.
#label ⇒ Object

Returns either the heading, or the title, or the url in this order by availability.
#redirect ⇒ Object

Returns the redirect follower instance used for resolving this instances url.
#title ⇒ Object
#url ⇒ Object

Returns the target url after all redirects.

Constructor Details

#initialize(original_url) ⇒ `HtmlPageTitle`

# File 'lib/html_page_title.rb', line 33

def initialize(original_url)
  @original_url = original_url
  title # retrieve data so exceptions can be thrown
end

Instance Attribute Details

#original_url ⇒ `Object` (readonly)

Returns the value of attribute original_url.



32
33
34

# File 'lib/html_page_title.rb', line 32

def original_url
  @original_url
end

Instance Method Details

#body ⇒ `Object`

Returns the body of the document at the (redirected?) target



75
76
77

# File 'lib/html_page_title.rb', line 75

def body
  redirect.body
end

#document ⇒ `Object`



38
39
40

# File 'lib/html_page_title.rb', line 38

def document
  @document ||= Hpricot(redirect.body)
end

#heading ⇒ `Object`

Retrieves the first h1 tag in the page and returns it’s content

# File 'lib/html_page_title.rb', line 50

def heading
  return @heading if @heading
  if heading_tag = document.at('body h1')
    @heading = heading_tag.inner_html.strip.chomp
  end
end

#label ⇒ `Object`

Returns either the heading, or the title, or the url in this order by availability



59
60
61

# File 'lib/html_page_title.rb', line 59

def label
  heading or title or url
end

#redirect ⇒ `Object`

Returns the redirect follower instance used for resolving this instances url



65
66
67

# File 'lib/html_page_title.rb', line 65

def redirect
  @redirect = RedirectFollower.new(original_url)    
end

#title ⇒ `Object`

# File 'lib/html_page_title.rb', line 42

def title
  return @title if @title
  if title_tag = document.at('head title')
    @title = title_tag.inner_html.strip.chomp
  end
end

#url ⇒ `Object`

Returns the target url after all redirects



70
71
72

# File 'lib/html_page_title.rb', line 70

def url
  redirect.url
end

Class: HtmlPageTitle

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(original_url) ⇒ HtmlPageTitle

Instance Attribute Details

#original_url ⇒ Object (readonly)

Instance Method Details

#body ⇒ Object

#document ⇒ Object

#heading ⇒ Object

#label ⇒ Object

#redirect ⇒ Object

#title ⇒ Object

#url ⇒ Object