Module: Gbbib::SecScrapper

Extended by:
Scrapper
Defined in:
lib/gbbib/sec_scrapper.rb

Overview

Sector standard scrapper

Class Method Summary collapse

Methods included from Scrapper

get_docid, get_status, get_titles, get_type, scrapped_data

Class Method Details

.scrape_doc(pid) ⇒ Gbbib::GbBibliographicItem

Parameters:

  • pid (String)

    standard’s page id

Returns:



31
32
33
34
35
36
# File 'lib/gbbib/sec_scrapper.rb', line 31

def scrape_doc(pid)
  src = "http://www.std.gov.cn/hb/search/stdHBDetailed?id=#{pid}"
  page_uri = URI src
  doc = Nokogiri::HTML Net::HTTP.get(page_uri)
  GbBibliographicItem.new scrapped_data(doc, src: src)
end

.scrape_page(text) ⇒ Gbbib::HitCollection

Parameters:

  • text (String)

    code of standard for serarch

Returns:



20
21
22
23
24
25
26
27
# File 'lib/gbbib/sec_scrapper.rb', line 20

def scrape_page(text)
  uri = URI "http://www.std.gov.cn/hb/search/hbPage?searchText=#{text}"
  res = JSON.parse Net::HTTP.get(uri)
  hits = res['rows'].map do |r|
    Hit.new pid: r['id'], title: r['STD_CODE'], scrapper: self
  end
  HitCollection.new hits
end