Module: RelatonItu::Scrapper
- Defined in:
- lib/relaton_itu/scrapper.rb
Overview
Scrapper.
Constant Summary collapse
- ROMAN_MONTHS =
%w[I II III IV V VI VII VIII IX X XI XII].freeze
- TYPES =
{ "ISO" => "international-standard", "TS" => "technicalSpecification", "TR" => "technicalReport", "PAS" => "publiclyAvailableSpecification", "AWI" => "appruvedWorkItem", "CD" => "committeeDraft", "FDIS" => "finalDraftInternationalStandard", "NP" => "newProposal", "DIS" => "draftInternationalStandard", "WD" => "workingDraft", "R" => "recommendation", "Guide" => "guide", }.freeze
Class Method Summary collapse
-
.parse_page(hit_data, imp = false) ⇒ Hash
Parse page.
Class Method Details
.parse_page(hit_data, imp = false) ⇒ Hash
Parse page.
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
# File 'lib/relaton_itu/scrapper.rb', line 32 def parse_page(hit_data, imp = false) url, doc = get_page hit_data[:url] if imp a = doc.at "//span[contains(@id, 'tab_ig_uc_rec')]/a" return unless a url, doc = get_page URI.join(url, a[:href]).to_s end # Fetch edition. edition = doc.at("//table/tr/td/span[contains(@id, 'Label8')]/b")&.text ItuBibliographicItem.new( fetched: Date.today.to_s, type: "standard", docid: fetch_docid(doc), edition: edition, language: ["en"], script: ["Latn"], title: fetch_titles(doc), doctype: hit_data[:type], docstatus: fetch_status(doc), ics: [], # fetch_ics(doc), date: fetch_dates(doc), contributor: fetch_contributors(hit_data[:code]), editorialgroup: fetch_workgroup(hit_data[:code], doc), abstract: fetch_abstract(doc), copyright: fetch_copyright(hit_data[:code], doc), link: fetch_link(doc, url), relation: fetch_relations(doc), place: ["Geneva"], ) end |