Class: Lobbyliste::Downloader

Inherits:

Object

Object
Lobbyliste::Downloader

show all

Defined in:: lib/lobbyliste/downloader.rb

Overview

This class finds the lobbyliste pdf on the Bundestag website, downloads it and extracts the pdf content

Instance Method Summary collapse

#fetch_pdf_link(bundestag_page = "https://www.bundestag.de/parlament/lobbyliste") ⇒ String

Since the link to the PDF changes with every new version we download the Lobbyliste website and extract the link Use this method to extract link from different page if the bundestag website structure is changed again.
#html_data ⇒ String

Extracted content of pdf file in html format.
#initialize(pdf_link = nil) ⇒ Downloader constructor

Creates a new Downloader.
#pdf_data ⇒ String

Raw content of pdf file.
#pdf_link ⇒ String

Link to Lobbyliste pdf.
#text_data ⇒ String

Extracted content of pdf file.

Constructor Details

#initialize(pdf_link = nil) ⇒ `Downloader`

Creates a new Downloader

Parameters:

link (String) —

that will be used to fetch the lobbylist pdf, defaults to nil



11
12
13

# File 'lib/lobbyliste/downloader.rb', line 11

def initialize(pdf_link=nil)
  @pdf_link = pdf_link
end

Instance Method Details

#fetch_pdf_link(bundestag_page = "https://www.bundestag.de/parlament/lobbyliste") ⇒ `String`

Since the link to the PDF changes with every new version we download the Lobbyliste website and extract the link Use this method to extract link from different page if the bundestag website structure is changed again

Parameters:

page (String) —

that will be used to extract the PDF link. May change from time to time.

Returns:

(String) —

the link to the Lobbyliste pdf

Raises:

(NoPdfLinkFound)

# File 'lib/lobbyliste/downloader.rb', line 44

def fetch_pdf_link(bundestag_page = "https://www.bundestag.de/parlament/lobbyliste")
  website = Nokogiri::HTML(open(bundestag_page))
  link = website.css("a[title^='Aktuelle Fassung']").first

  raise NoPdfLinkFound.new("Could not find link to the Lobbyist PDF on the bundestag website!") unless link
  @pdf_link = "https://bundestag.de#{link['href']}"
end

#html_data ⇒ `String`

Returns extracted content of pdf file in html format.

Returns:

(String) —

extracted content of pdf file in html format

# File 'lib/lobbyliste/downloader.rb', line 29

def html_data
  extract_pdf unless @html_data
  @html_data
end

#pdf_data ⇒ `String`

Returns raw content of pdf file.

Returns:

(String) —

raw content of pdf file

# File 'lib/lobbyliste/downloader.rb', line 16

def pdf_data
  retrieve_pdf unless @pdf_data
  @pdf_data
end

#pdf_link ⇒ `String`

Returns link to Lobbyliste pdf.

Returns:

(String) —

link to Lobbyliste pdf

# File 'lib/lobbyliste/downloader.rb', line 35

def pdf_link
  fetch_pdf_link unless @pdf_link
  @pdf_link
end

#text_data ⇒ `String`

Returns extracted content of pdf file.

Returns:

(String) —

extracted content of pdf file

# File 'lib/lobbyliste/downloader.rb', line 23

def text_data
  extract_pdf unless @text_data
  @text_data
end

Class: Lobbyliste::Downloader

Overview

Instance Method Summary collapse

Constructor Details

#initialize(pdf_link = nil) ⇒ Downloader

Instance Method Details

#fetch_pdf_link(bundestag_page = "https://www.bundestag.de/parlament/lobbyliste") ⇒ String

#html_data ⇒ String

#pdf_data ⇒ String

#pdf_link ⇒ String

#text_data ⇒ String

#initialize(pdf_link = nil) ⇒ `Downloader`

#fetch_pdf_link(bundestag_page = "https://www.bundestag.de/parlament/lobbyliste") ⇒ `String`

#html_data ⇒ `String`

#pdf_data ⇒ `String`

#pdf_link ⇒ `String`

#text_data ⇒ `String`