Module: Gbbib::SecScrapper

Extended by:
Scrapper
Defined in:
lib/gbbib/sec_scrapper.rb

Overview

Sector standard scrapper

Class Method Summary collapse

Methods included from Scrapper

get_contributors, get_docid, get_status, get_titles, get_type, scrapped_data

Class Method Details

.scrape_doc(pid) ⇒ Gbbib::GbBibliographicItem

Parameters:

  • pid (String)

    standard’s page id

Returns:



35
36
37
38
39
40
41
42
43
44
# File 'lib/gbbib/sec_scrapper.rb', line 35

def scrape_doc(pid)
  src = "http://www.std.gov.cn/hb/search/stdHBDetailed?id=#{pid}"
  page_uri = URI src
  begin
    doc = Nokogiri::HTML Net::HTTP.get(page_uri)
    GbBibliographicItem.new scrapped_data(doc, src: src)
  rescue
    warn "Cannot access #{src}"
  end
end

.scrape_page(text) ⇒ Gbbib::HitCollection

Parameters:

  • text (String)

    code of standard for serarch

Returns:



20
21
22
23
24
25
26
27
28
29
30
31
# File 'lib/gbbib/sec_scrapper.rb', line 20

def scrape_page(text)
  uri = URI "http://www.std.gov.cn/hb/search/hbPage?searchText=#{text}"
  begin
    res = JSON.parse Net::HTTP.get(uri)
    hits = res['rows'].map do |r|
      Hit.new pid: r['id'], title: r['STD_CODE'], scrapper: self
    end
    HitCollection.new hits
  rescue
    warn "Cannot access #{uri}"
  end
end