Module: RateBeer::Scraping

Included in:
Beer, Brewery, Location, Search, Style
Defined in:
lib/ratebeer/scraping.rb

Overview

The Scraping module contains a series of methods to assist with scraping pages from RateBeer.com, and dealing with the results.

Defined Under Namespace

Classes: PageNotFoundError

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#idObject (readonly)

Returns the value of attribute id.



13
14
15
# File 'lib/ratebeer/scraping.rb', line 13

def id
  @id
end

Class Method Details

.included(base) ⇒ Object

Run method on inclusion in class.



16
17
18
19
20
21
22
23
24
25
# File 'lib/ratebeer/scraping.rb', line 16

def self.included(base)
  base.data_keys.each do |attr|
    define_method(attr) do
      unless instance_variable_defined?("@#{attr}")
        retrieve_details
      end
      instance_variable_get("@#{attr}")
    end
  end
end

.nbspObject

Emulate   character for stripping, substitution, etc.



110
111
112
# File 'lib/ratebeer/scraping.rb', line 110

def nbsp
  Nokogiri::HTML(" ").text
end

.noko_doc(url) ⇒ Object

Create Nokogiri doc from url.



98
99
100
101
102
103
104
# File 'lib/ratebeer/scraping.rb', line 98

def noko_doc(url)
  begin
    Nokogiri::HTML(open(url).read)
  rescue OpenURI::HTTPError => msg
    raise PageNotFoundError.new("Page not found - #{url}")
  end
end

Instance Method Details

#==(other_entity) ⇒ Object



53
54
55
# File 'lib/ratebeer/scraping.rb', line 53

def ==(other_entity)
  other_entity.is_a?(self.class) && id == other_entity.id
end

#fix_characters(string) ⇒ Object

Fix characters in string scraped from website.

This method substitutes problematic characters found in strings scraped from RateBeer.com



127
128
129
130
131
132
133
134
135
# File 'lib/ratebeer/scraping.rb', line 127

def fix_characters(string)
  characters = { nbsp     => " ",
                 "\u0093" => "ž",
                 "\u0092" => "'",
                 "\u0096" => "",
                 / {2,}/ => " " }
  characters.each { |c, r| string.gsub!(c, r) }
  string.strip
end

#full_detailsObject

Return full details of the scraped entity in a Hash.



65
66
67
68
69
70
71
72
# File 'lib/ratebeer/scraping.rb', line 65

def full_details
  data = self.class
             .data_keys
             .map { |k| [k, send("#{k}")] }
             .to_h
  { id:   id,
    url:  url }.merge(data)
end

#initialize(id, name: nil, **options) ⇒ Object

Create RateBeer::Scraper instance.

Requires an ID#, and optionally accepts a name and options parameters.

Parameters:

  • id (Integer, String)

    ID# of the entity which is to be retrieved

  • name (String) (defaults to: nil)

    Name of the entity to which ID# relates if known

  • options (hash)

    Options hash for entity created



35
36
37
38
39
40
41
# File 'lib/ratebeer/scraping.rb', line 35

def initialize(id, name: nil, **options)
  @id   = id
  @name = name unless name.nil?
  options.each do |k, v|
    instance_variable_set("@#{k.to_s}", v)
  end
end

#inspectObject



43
44
45
46
47
# File 'lib/ratebeer/scraping.rb', line 43

def inspect
  val = "#<#{self.class} ##{@id}"
  val << " - #{@name}" if instance_variable_defined?("@name")
  val << ">"
end

#page_count(doc) ⇒ Integer

Determine the number of pages in a document.

Parameters:

  • doc (Nokogiri::Doc)

    Nokogiri document to test for pagination

Returns:

  • (Integer)

    Number of pages in the document



88
89
90
91
92
93
94
# File 'lib/ratebeer/scraping.rb', line 88

def page_count(doc)
  doc.at_css('.pagination') && doc.at_css('.pagination')
                                  .css('b')
                                  .map(&:text)
                                  .map(&:to_i)
                                  .max
end

#pagination?(doc) ⇒ Boolean

Determine if data is paginated, or not.

Parameters:

  • doc (Nokogiri::Doc)

    Nokogiri document to test for pagination

Returns:

  • (Boolean)

    true, if paginated, else false



79
80
81
# File 'lib/ratebeer/scraping.rb', line 79

def pagination?(doc)
  !page_count(doc).nil?
end

#post_request(url, params) ⇒ Object

Make POST request to RateBeer form. Return a Nokogiri doc.



139
140
141
142
# File 'lib/ratebeer/scraping.rb', line 139

def post_request(url, params)
  res = Net::HTTP.post_form(url, params)
  Nokogiri::HTML(res.body)
end

#symbolize_text(text) ⇒ Object

Convert text keys to symbols



118
119
120
# File 'lib/ratebeer/scraping.rb', line 118

def symbolize_text(text)
  text.downcase.gsub(' ', '_').gsub('.', '').to_sym
end

#to_sObject



49
50
51
# File 'lib/ratebeer/scraping.rb', line 49

def to_s
  inspect
end

#urlObject



57
58
59
60
61
# File 'lib/ratebeer/scraping.rb', line 57

def url
  @url ||= if respond_to?("#{demodularized_class_name.downcase}_url", id)
             send("#{demodularized_class_name.downcase}_url", id)
           end
end