Class: HtmlPageTitle

Inherits:
Object
  • Object
show all
Defined in:
lib/html_page_title.rb

Overview

A simple class for finding the title of a given http url by fetching the given url, following all eventual redirects and finally parsing it through hpricot.

You can either use the shorthand form or initialize the instance properly:

* HtmlPageTitle('http://github.com')
* HtmlPageTitle.new('http://github.com')

Those calls are equivalent, except for one subtle difference: The shorthand form will swallow SocketErrors and return nil (i.e. this will happen for invalid urls), while the regular instantiation via new will throw that error.

You can either get the title, the heading (which will be the content of the first h1 tag in the body) or the label, which will be (in the following order by availability) either the heading, or the title, or the target url after redirecting. Note that if the title or the heading can not be found (e.g. a non-HTML document), both methods will return nil, so the label method is the only one that will always return some kind of string

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(original_url) ⇒ HtmlPageTitle



33
34
35
36
# File 'lib/html_page_title.rb', line 33

def initialize(original_url)
  @original_url = original_url
  title # retrieve data so exceptions can be thrown
end

Instance Attribute Details

#original_urlObject (readonly)

Returns the value of attribute original_url.



32
33
34
# File 'lib/html_page_title.rb', line 32

def original_url
  @original_url
end

Instance Method Details

#bodyObject

Returns the body of the document at the (redirected?) target



75
76
77
# File 'lib/html_page_title.rb', line 75

def body
  redirect.body
end

#documentObject



38
39
40
# File 'lib/html_page_title.rb', line 38

def document
  @document ||= Hpricot(redirect.body)
end

#headingObject

Retrieves the first h1 tag in the page and returns it’s content



50
51
52
53
54
55
# File 'lib/html_page_title.rb', line 50

def heading
  return @heading if @heading
  if heading_tag = document.at('body h1')
    @heading = heading_tag.inner_html.strip.chomp
  end
end

#labelObject

Returns either the heading, or the title, or the url in this order by availability



59
60
61
# File 'lib/html_page_title.rb', line 59

def label
  heading or title or url
end

#redirectObject

Returns the redirect follower instance used for resolving this instances url



65
66
67
# File 'lib/html_page_title.rb', line 65

def redirect
  @redirect = RedirectFollower.new(original_url)    
end

#titleObject



42
43
44
45
46
47
# File 'lib/html_page_title.rb', line 42

def title
  return @title if @title
  if title_tag = document.at('head title')
    @title = title_tag.inner_html.strip.chomp
  end
end

#urlObject

Returns the target url after all redirects



70
71
72
# File 'lib/html_page_title.rb', line 70

def url
  redirect.url
end