Class: Hongkong::News::Scrapers::MingpaoScraper
- Inherits:
-
Object
- Object
- Hongkong::News::Scrapers::MingpaoScraper
- Includes:
- PhantomScraper
- Defined in:
- lib/hongkong/news/scrapers/mingpao_scraper.rb
Constant Summary collapse
- LIST_URL =
"http://news.mingpao.com/pns/%E6%96%B0%E8%81%9E%E7%B8%BD%E8%A6%BD/web_tc/archive/latest"
Instance Method Summary collapse
- #name ⇒ Object
-
#news(url) ⇒ Object
Extract article from page from Mingpao.
-
#news_links ⇒ Object
Extract all news links from Mingpao.
Methods included from PhantomScraper
Instance Method Details
#name ⇒ Object
12 13 14 |
# File 'lib/hongkong/news/scrapers/mingpao_scraper.rb', line 12 def name "mingpao" end |
#news(url) ⇒ Object
Extract article from page from Mingpao
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
# File 'lib/hongkong/news/scrapers/mingpao_scraper.rb', line 29 def news(url) visit url # wait for content to be loaded first("article p") document = Document.new document.source = name document.title = doc.search("h1").text document.url = url document.html = html document.content = page.evaluate_script("HongKongNews.getInnerText('article')") document.screenshot_data = screenshot_data document.image_url = doc.search("//meta[@property='og:image']/@content").first.text rescue nil document end |
#news_links ⇒ Object
Extract all news links from Mingpao
17 18 19 20 21 22 23 24 25 26 |
# File 'lib/hongkong/news/scrapers/mingpao_scraper.rb', line 17 def news_links visit LIST_URL all(".listing ul li a").collect do |anchor| link = Link.new link.title = anchor.text link.url = URI::join(LIST_URL, anchor["href"]).to_s link end end |