Module: RelatonGb::SecScrapper
- Extended by:
- Scrapper
- Defined in:
- lib/relaton_gb/sec_scrapper.rb
Overview
Sector standard scrapper
Class Method Summary collapse
Methods included from Scrapper
fetch_structuredidentifier, get_contributors, get_docid, get_status, get_titles, get_type, scrapped_data
Class Method Details
.scrape_doc(pid) ⇒ RelatonGb::GbBibliographicItem
34 35 36 37 38 39 40 41 42 |
# File 'lib/relaton_gb/sec_scrapper.rb', line 34 def scrape_doc(pid) src = "http://www.std.gov.cn/hb/search/stdHBDetailed?id=#{pid}" page_uri = URI src doc = Nokogiri::HTML Net::HTTP.get(page_uri) GbBibliographicItem.new scrapped_data(doc, src: src) rescue SocketError, Timeout::Error, Errno::EINVAL, Errno::ECONNRESET, EOFError, Net::HTTPBadResponse, Net::HTTPHeaderSyntaxError, Net::ProtocolError raise RelatonBib::RequestError, "Cannot access #{src}" end |
.scrape_page(text) ⇒ RelatonGb::HitCollection
20 21 22 23 24 25 26 27 28 29 30 |
# File 'lib/relaton_gb/sec_scrapper.rb', line 20 def scrape_page(text) uri = URI "http://www.std.gov.cn/hb/search/hbPage?searchText=#{text}" res = JSON.parse Net::HTTP.get(uri) hits = res["rows"].map do |r| Hit.new pid: r["id"], title: r["STD_CODE"], scrapper: self end HitCollection.new hits rescue SocketError, Timeout::Error, Errno::EINVAL, Errno::ECONNRESET, EOFError, Net::HTTPBadResponse, Net::HTTPHeaderSyntaxError, Net::ProtocolError raise RelatonBib::RequestError, "Cannot access #{uri}" end |