Module: RateBeer::Scraping
Overview
The Scraping module contains a series of methods to assist with scraping pages from RateBeer.com, and dealing with the results.
Defined Under Namespace
Classes: PageNotFoundError
Instance Attribute Summary collapse
-
#id ⇒ Object
readonly
Returns the value of attribute id.
Class Method Summary collapse
-
.included(base) ⇒ Object
Run method on inclusion in class.
-
.nbsp ⇒ Object
Emulate character for stripping, substitution, etc.
-
.noko_doc(url) ⇒ Object
Create Nokogiri doc from url.
Instance Method Summary collapse
- #==(other_entity) ⇒ Object
-
#fix_characters(string) ⇒ Object
Fix characters in string scraped from website.
-
#full_details ⇒ Object
Return full details of the scraped entity in a Hash.
-
#initialize(id, name: nil, **options) ⇒ Object
Create RateBeer::Scraper instance.
- #inspect ⇒ Object
-
#page_count(doc) ⇒ Integer
Determine the number of pages in a document.
-
#pagination?(doc) ⇒ Boolean
Determine if data is paginated, or not.
-
#post_request(url, params) ⇒ Object
Make POST request to RateBeer form.
-
#symbolize_text(text) ⇒ Object
Convert text keys to symbols.
- #to_s ⇒ Object
- #url ⇒ Object
Instance Attribute Details
#id ⇒ Object (readonly)
Returns the value of attribute id.
13 14 15 |
# File 'lib/ratebeer/scraping.rb', line 13 def id @id end |
Class Method Details
.included(base) ⇒ Object
Run method on inclusion in class.
16 17 18 19 20 21 22 23 24 25 |
# File 'lib/ratebeer/scraping.rb', line 16 def self.included(base) base.data_keys.each do |attr| define_method(attr) do unless instance_variable_defined?("@#{attr}") retrieve_details end instance_variable_get("@#{attr}") end end end |
.nbsp ⇒ Object
Emulate character for stripping, substitution, etc.
110 111 112 |
# File 'lib/ratebeer/scraping.rb', line 110 def nbsp Nokogiri::HTML(" ").text end |
.noko_doc(url) ⇒ Object
Create Nokogiri doc from url.
98 99 100 101 102 103 104 |
# File 'lib/ratebeer/scraping.rb', line 98 def noko_doc(url) begin Nokogiri::HTML(open(url).read) rescue OpenURI::HTTPError => msg raise PageNotFoundError.new("Page not found - #{url}") end end |
Instance Method Details
#==(other_entity) ⇒ Object
53 54 55 |
# File 'lib/ratebeer/scraping.rb', line 53 def ==(other_entity) other_entity.is_a?(self.class) && id == other_entity.id end |
#fix_characters(string) ⇒ Object
Fix characters in string scraped from website.
This method substitutes problematic characters found in strings scraped from RateBeer.com
127 128 129 130 131 132 133 134 135 |
# File 'lib/ratebeer/scraping.rb', line 127 def fix_characters(string) characters = { nbsp => " ", "\u0093" => "ž", "\u0092" => "'", "\u0096" => "–", / {2,}/ => " " } characters.each { |c, r| string.gsub!(c, r) } string.strip end |
#full_details ⇒ Object
Return full details of the scraped entity in a Hash.
65 66 67 68 69 70 71 72 |
# File 'lib/ratebeer/scraping.rb', line 65 def full_details data = self.class .data_keys .map { |k| [k, send("#{k}")] } .to_h { id: id, url: url }.merge(data) end |
#initialize(id, name: nil, **options) ⇒ Object
Create RateBeer::Scraper instance.
Requires an ID#, and optionally accepts a name and options parameters.
35 36 37 38 39 40 41 |
# File 'lib/ratebeer/scraping.rb', line 35 def initialize(id, name: nil, **) @id = id @name = name unless name.nil? .each do |k, v| instance_variable_set("@#{k.to_s}", v) end end |
#inspect ⇒ Object
43 44 45 46 47 |
# File 'lib/ratebeer/scraping.rb', line 43 def inspect val = "#<#{self.class} ##{@id}" val << " - #{@name}" if instance_variable_defined?("@name") val << ">" end |
#page_count(doc) ⇒ Integer
Determine the number of pages in a document.
88 89 90 91 92 93 94 |
# File 'lib/ratebeer/scraping.rb', line 88 def page_count(doc) doc.at_css('.pagination') && doc.at_css('.pagination') .css('b') .map(&:text) .map(&:to_i) .max end |
#pagination?(doc) ⇒ Boolean
Determine if data is paginated, or not.
79 80 81 |
# File 'lib/ratebeer/scraping.rb', line 79 def pagination?(doc) !page_count(doc).nil? end |
#post_request(url, params) ⇒ Object
Make POST request to RateBeer form. Return a Nokogiri doc.
139 140 141 142 |
# File 'lib/ratebeer/scraping.rb', line 139 def post_request(url, params) res = Net::HTTP.post_form(url, params) Nokogiri::HTML(res.body) end |
#symbolize_text(text) ⇒ Object
Convert text keys to symbols
118 119 120 |
# File 'lib/ratebeer/scraping.rb', line 118 def symbolize_text(text) text.downcase.gsub(' ', '_').gsub('.', '').to_sym end |
#to_s ⇒ Object
49 50 51 |
# File 'lib/ratebeer/scraping.rb', line 49 def to_s inspect end |
#url ⇒ Object
57 58 59 60 61 |
# File 'lib/ratebeer/scraping.rb', line 57 def url @url ||= if respond_to?("#{demodularized_class_name.downcase}_url", id) send("#{demodularized_class_name.downcase}_url", id) end end |