Module: Gbbib::SecScrapper
- Extended by:
- Scrapper
- Defined in:
- lib/gbbib/sec_scrapper.rb
Overview
Sector standard scrapper
Class Method Summary collapse
Methods included from Scrapper
get_docid, get_status, get_titles, get_type, scrapped_data
Class Method Details
.scrape_doc(pid) ⇒ Gbbib::GbBibliographicItem
31 32 33 34 35 36 |
# File 'lib/gbbib/sec_scrapper.rb', line 31 def scrape_doc(pid) src = "http://www.std.gov.cn/hb/search/stdHBDetailed?id=#{pid}" page_uri = URI src doc = Nokogiri::HTML Net::HTTP.get(page_uri) GbBibliographicItem.new scrapped_data(doc, src: src) end |
.scrape_page(text) ⇒ Gbbib::HitCollection
20 21 22 23 24 25 26 27 |
# File 'lib/gbbib/sec_scrapper.rb', line 20 def scrape_page(text) uri = URI "http://www.std.gov.cn/hb/search/hbPage?searchText=#{text}" res = JSON.parse Net::HTTP.get(uri) hits = res['rows'].map do |r| Hit.new pid: r['id'], title: r['STD_CODE'], scrapper: self end HitCollection.new hits end |