Class: UrlExpander::Expanders::Scrape

Inherits:
Object
  • Object
show all
Defined in:
lib/url_expander/expanders/scrape.rb

Overview

Some websites don’t follow the coding standards. They don’t provide an api and they don’t provide 301 redirect. The only way to get the shorten url is by parsing the returned html doc.

To use the Scrape class, define your class inside scrape folder. Your class must provide the following:

def initialize(short_url="", options={})
def self.scrape_url(html)
class Request

Example: class Qsrli < UrlExpander::Expanders::Scrape

PATTERN = %r'(http://qsr\.li(/[\w/]+))'
attr_reader :parent_klass, :xpath

def initialize(short_url="", options={})
  @parent_klass = self.class
  super(short_url, options)
end

def self.scrape_url(html)
  doc = Hpricot(html)
  doc.at('//*[@id="framecontent"]').attributes["src"]
end

class Request
  include HTTParty
  base_uri 'http://qsr.li'
end

end

Direct Known Subclasses

Qsrli, Shorl, Simurl

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(short_url = "", options = {}) ⇒ Scrape

Returns a new instance of Scrape.



40
41
42
43
44
45
46
47
48
# File 'lib/url_expander/expanders/scrape.rb', line 40

def initialize(short_url="",options={})
  if short_url.match(parent_klass::PATTERN)
    path = $2
  else
    raise 'invalid pattern'
  end
  
  @long_url = fetch_url(path)
end

Instance Attribute Details

#long_urlObject

Returns the value of attribute long_url.



37
38
39
# File 'lib/url_expander/expanders/scrape.rb', line 37

def long_url
  @long_url
end

#parent_klassObject (readonly)

Returns the value of attribute parent_klass.



38
39
40
# File 'lib/url_expander/expanders/scrape.rb', line 38

def parent_klass
  @parent_klass
end

#partternObject (readonly)

Returns the value of attribute parttern.



38
39
40
# File 'lib/url_expander/expanders/scrape.rb', line 38

def parttern
  @parttern
end