Class: HtmlPageTitle
- Inherits:
-
Object
- Object
- HtmlPageTitle
- Defined in:
- lib/html_page_title.rb
Overview
A simple class for finding the title of a given http url by fetching the given url, following all eventual redirects and finally parsing it through hpricot.
You can either use the shorthand form or initialize the instance properly:
* HtmlPageTitle('http://github.com')
* HtmlPageTitle.new('http://github.com')
Those calls are equivalent, except for one subtle difference: The shorthand form will swallow SocketErrors and return nil (i.e. this will happen for invalid urls), while the regular instantiation via new will throw that error.
You can either get the title, the heading (which will be the content of the first h1 tag in the body) or the label, which will be (in the following order by availability) either the heading, or the title, or the target url after redirecting. Note that if the title or the heading can not be found (e.g. a non-HTML document), both methods will return nil, so the label method is the only one that will always return some kind of string
Instance Attribute Summary collapse
-
#original_url ⇒ Object
readonly
Returns the value of attribute original_url.
Instance Method Summary collapse
-
#body ⇒ Object
Returns the body of the document at the (redirected?) target.
- #document ⇒ Object
-
#heading ⇒ Object
Retrieves the first h1 tag in the page and returns it’s content.
-
#initialize(original_url) ⇒ HtmlPageTitle
constructor
A new instance of HtmlPageTitle.
-
#label ⇒ Object
Returns either the heading, or the title, or the url in this order by availability.
-
#redirect ⇒ Object
Returns the redirect follower instance used for resolving this instances url.
- #title ⇒ Object
-
#url ⇒ Object
Returns the target url after all redirects.
Constructor Details
#initialize(original_url) ⇒ HtmlPageTitle
33 34 35 36 |
# File 'lib/html_page_title.rb', line 33 def initialize(original_url) @original_url = original_url title # retrieve data so exceptions can be thrown end |
Instance Attribute Details
#original_url ⇒ Object (readonly)
Returns the value of attribute original_url.
32 33 34 |
# File 'lib/html_page_title.rb', line 32 def original_url @original_url end |
Instance Method Details
#body ⇒ Object
Returns the body of the document at the (redirected?) target
75 76 77 |
# File 'lib/html_page_title.rb', line 75 def body redirect.body end |
#document ⇒ Object
38 39 40 |
# File 'lib/html_page_title.rb', line 38 def document @document ||= Hpricot(redirect.body) end |
#heading ⇒ Object
Retrieves the first h1 tag in the page and returns it’s content
50 51 52 53 54 55 |
# File 'lib/html_page_title.rb', line 50 def heading return @heading if @heading if heading_tag = document.at('body h1') @heading = heading_tag.inner_html.strip.chomp end end |
#label ⇒ Object
Returns either the heading, or the title, or the url in this order by availability
59 60 61 |
# File 'lib/html_page_title.rb', line 59 def label heading or title or url end |
#redirect ⇒ Object
Returns the redirect follower instance used for resolving this instances url
65 66 67 |
# File 'lib/html_page_title.rb', line 65 def redirect @redirect = RedirectFollower.new(original_url) end |
#title ⇒ Object
42 43 44 45 46 47 |
# File 'lib/html_page_title.rb', line 42 def title return @title if @title if title_tag = document.at('head title') @title = title_tag.inner_html.strip.chomp end end |
#url ⇒ Object
Returns the target url after all redirects
70 71 72 |
# File 'lib/html_page_title.rb', line 70 def url redirect.url end |