Module: Gbbib::GbScrapper

Extended by:
Scrapper
Defined in:
lib/gbbib/gb_scrapper.rb

Overview

National standard scrapper.

Class Method Summary collapse

Methods included from Scrapper

get_docid, get_status, get_titles, get_type, scrapped_data

Class Method Details

.get_committee(doc) ⇒ Hash

Returns * :type [String]

  • :name [String].

Parameters:

  • doc (Nokogiri::HTML)

Returns:

  • (Hash)
    • :type [String]

    • :name [String]



43
44
45
46
47
# File 'lib/gbbib/gb_scrapper.rb', line 43

def get_committee(doc)
  name = doc.xpath('//p/a[1]/following-sibling::text()').text
            .match(/(?<=()[^)]+/).to_s
  { type: 'technical', name: name }
end

.scrape_doc(pid) ⇒ Gbbib::GbBibliographicItem

Parameters:

  • pid (Strin)

    standard’s page id

Returns:



33
34
35
36
37
# File 'lib/gbbib/gb_scrapper.rb', line 33

def scrape_doc(pid)
  src = 'http://www.std.gov.cn/gb/search/gbDetailed?id=' + pid
  doc = Nokogiri::HTML OpenURI.open_uri(src)
  GbBibliographicItem.new scrapped_data(doc, src: src)
end

.scrape_page(text) ⇒ Gbbib::HitCollection

Parameters:

  • text (Strin)

    code of standard for serarch

Returns:



20
21
22
23
24
25
26
27
28
29
# File 'lib/gbbib/gb_scrapper.rb', line 20

def scrape_page(text)
  search_html = OpenURI.open_uri(
    'http://www.std.gov.cn/search/stdPage?q=' + text
  )
  result = Nokogiri::HTML search_html
  hits = result.css('.s-title a').map do |h|
    Hit.new pid: h[:pid], title: h.text, scrapper: self
  end
  HitCollection.new hits
end