Module: Extraction

Included in:
Blogbot
Defined in:
lib/blogbot/extraction.rb

Overview

Adds capability to extract data in an organized format from webpage.

Instance Method Summary collapse

Instance Method Details



19
20
21
22
23
24
25
26
27
28
29
# File 'lib/blogbot/extraction.rb', line 19

def display_links
  puts "-"*50
  @popular_links.each do |hash|
    hash.each do |k, v|
      puts "#{k.upcase}: #{v}"
    end
    puts
  end
  puts "-"*50
  @popular_links
end

#extract(url) ⇒ Object



31
32
33
34
35
36
37
38
# File 'lib/blogbot/extraction.rb', line 31

def extract(url)
  reset
  puts "\nExtracting ...\n"
  scan url
  locate_popular_links
  extract_links
  @popular_links.nil? == true ? simple_error : display_links
end

Extracts titles and hyperlinks from element being examined. If the text is an empty ” it’s an <img>. Images are typically duplicate links and ok to skip.



7
8
9
10
11
12
13
14
15
16
17
# File 'lib/blogbot/extraction.rb', line 7

def extract_links
  puts 'Not enough links to extract' if see_multiple_links? == false

  @current_element.css('a').each do |a|
    next if a.text == '' || a['href'] == '#'
    title = a.text
    link = a['href']
    hash = {title: title, link: link}
    @popular_links << hash
  end
end