Class: Lobbyliste::Downloader

Inherits:
Object
  • Object
show all
Defined in:
lib/lobbyliste/downloader.rb

Overview

This class finds the lobbyliste pdf on the Bundestag website, downloads it and extracts the pdf content

Instance Method Summary collapse

Constructor Details

#initialize(pdf_link = nil) ⇒ Downloader

Creates a new Downloader

Parameters:

  • link (String)

    that will be used to fetch the lobbylist pdf, defaults to nil



11
12
13
# File 'lib/lobbyliste/downloader.rb', line 11

def initialize(pdf_link=nil)
  @pdf_link = pdf_link
end

Instance Method Details

Since the link to the PDF changes with every new version we download the Lobbyliste website and extract the link Use this method to extract link from different page if the bundestag website structure is changed again

Parameters:

  • page (String)

    that will be used to extract the PDF link. May change from time to time.

Returns:

  • (String)

    the link to the Lobbyliste pdf

Raises:



44
45
46
47
48
49
50
# File 'lib/lobbyliste/downloader.rb', line 44

def fetch_pdf_link(bundestag_page = "https://www.bundestag.de/parlament/lobbyliste")
  website = Nokogiri::HTML(open(bundestag_page))
  link = website.css("a[title^='Aktuelle Fassung']").first

  raise NoPdfLinkFound.new("Could not find link to the Lobbyist PDF on the bundestag website!") unless link
  @pdf_link = "https://bundestag.de#{link['href']}"
end

#html_dataString

Returns extracted content of pdf file in html format.

Returns:

  • (String)

    extracted content of pdf file in html format



29
30
31
32
# File 'lib/lobbyliste/downloader.rb', line 29

def html_data
  extract_pdf unless @html_data
  @html_data
end

#pdf_dataString

Returns raw content of pdf file.

Returns:

  • (String)

    raw content of pdf file



16
17
18
19
# File 'lib/lobbyliste/downloader.rb', line 16

def pdf_data
  retrieve_pdf unless @pdf_data
  @pdf_data
end

Returns link to Lobbyliste pdf.

Returns:

  • (String)

    link to Lobbyliste pdf



35
36
37
38
# File 'lib/lobbyliste/downloader.rb', line 35

def pdf_link
  fetch_pdf_link unless @pdf_link
  @pdf_link
end

#text_dataString

Returns extracted content of pdf file.

Returns:

  • (String)

    extracted content of pdf file



23
24
25
26
# File 'lib/lobbyliste/downloader.rb', line 23

def text_data
  extract_pdf unless @text_data
  @text_data
end