Class: Lobbyliste::Downloader
Overview
This class finds the lobbyliste pdf on the Bundestag website, downloads it and extracts the pdf content
Instance Method Summary collapse
-
#fetch_pdf_link(bundestag_page = "https://www.bundestag.de/parlament/lobbyliste") ⇒ String
Since the link to the PDF changes with every new version we download the Lobbyliste website and extract the link Use this method to extract link from different page if the bundestag website structure is changed again.
-
#html_data ⇒ String
Extracted content of pdf file in html format.
-
#initialize(pdf_link = nil) ⇒ Downloader
constructor
Creates a new Downloader.
-
#pdf_data ⇒ String
Raw content of pdf file.
-
#pdf_link ⇒ String
Link to Lobbyliste pdf.
-
#text_data ⇒ String
Extracted content of pdf file.
Constructor Details
#initialize(pdf_link = nil) ⇒ Downloader
Creates a new Downloader
11 12 13 |
# File 'lib/lobbyliste/downloader.rb', line 11 def initialize(pdf_link=nil) @pdf_link = pdf_link end |
Instance Method Details
#fetch_pdf_link(bundestag_page = "https://www.bundestag.de/parlament/lobbyliste") ⇒ String
Since the link to the PDF changes with every new version we download the Lobbyliste website and extract the link Use this method to extract link from different page if the bundestag website structure is changed again
44 45 46 47 48 49 50 |
# File 'lib/lobbyliste/downloader.rb', line 44 def fetch_pdf_link(bundestag_page = "https://www.bundestag.de/parlament/lobbyliste") website = Nokogiri::HTML(open(bundestag_page)) link = website.css("a[title^='Aktuelle Fassung']").first raise NoPdfLinkFound.new("Could not find link to the Lobbyist PDF on the bundestag website!") unless link @pdf_link = "https://bundestag.de#{link['href']}" end |
#html_data ⇒ String
Returns extracted content of pdf file in html format.
29 30 31 32 |
# File 'lib/lobbyliste/downloader.rb', line 29 def html_data extract_pdf unless @html_data @html_data end |
#pdf_data ⇒ String
Returns raw content of pdf file.
16 17 18 19 |
# File 'lib/lobbyliste/downloader.rb', line 16 def pdf_data retrieve_pdf unless @pdf_data @pdf_data end |
#pdf_link ⇒ String
Returns link to Lobbyliste pdf.
35 36 37 38 |
# File 'lib/lobbyliste/downloader.rb', line 35 def pdf_link fetch_pdf_link unless @pdf_link @pdf_link end |
#text_data ⇒ String
Returns extracted content of pdf file.
23 24 25 26 |
# File 'lib/lobbyliste/downloader.rb', line 23 def text_data extract_pdf unless @text_data @text_data end |