Class: Caboodle::FeedDetector

Inherits:
Object
  • Object
show all
Defined in:
lib/caboodle/scrape.rb

Class Method Summary collapse

Class Method Details

.fetch_feed_url(page_url, only_detect = nil) ⇒ Object

return the feed url for a url for example: blog.dominiek.com/ => blog.dominiek.com/feed/atom.xml only_detect can force detection of :rss or :atom



64
65
66
67
68
69
70
71
72
73
# File 'lib/caboodle/scrape.rb', line 64

def self.fetch_feed_url(page_url, only_detect=nil)
  url = URI.parse(page_url)
  host_with_port = url.host
  host_with_port << ":#{url.port}" unless url.port == 80

  res = Weary.get(page_url).perform_sleepily

  feed_url = self.get_feed_path(res.body, only_detect)
  "http://#{host_with_port}/#{feed_url.gsub(/^\//, '')}" unless !feed_url || feed_url =~ /^http:\/\//
end

.get_feed_path(html, only_detect = nil) ⇒ Object

get the feed href from an HTML document for example: … <link href=“/feed/atom.xml” rel=“alternate” type=“application/atom+xml” /> …

> /feed/atom.xml

only_detect can force detection of :rss or :atom



83
84
85
86
87
88
89
90
91
92
93
# File 'lib/caboodle/scrape.rb', line 83

def self.get_feed_path(html, only_detect=nil)
  unless only_detect && only_detect != :atom
    md ||= /<link.*href=['"]*([^\s'"]+)['"]*.*application\/atom\+xml.*>/.match(html)
    md ||= /<link.*application\/atom\+xml.*href=['"]*([^\s'"]+)['"]*.*>/.match(html)
  end
  unless only_detect && only_detect != :rss
    md ||= /<link.*href=['"]*([^\s'"]+)['"]*.*application\/rss\+xml.*>/.match(html)
    md ||= /<link.*application\/rss\+xml.*href=['"]*([^\s'"]+)['"]*.*>/.match(html)
  end
  md && md[1]
end