Class: RestaurantWeekBoston::Scraper
- Inherits:
-
Object
- Object
- RestaurantWeekBoston::Scraper
- Includes:
- Enumerable
- Defined in:
- lib/restaurant_week_boston/scraper.rb
Overview
Scrapes Restaurant Week site.
Instance Method Summary collapse
-
#create_url(opts = {}) ⇒ Object
opts
is a hash of options, with the following keys: :neighborhood-
dorchester, back-bay, etc.
-
#doc ⇒ Object
Return a Nokogiri::HTML::Document parsed from get_html.
-
#each(&blk) ⇒ Object
Iterates over @restaurants.
-
#get_html ⇒ Object
Returns the result of open()ing the url from create_url(), as a String.
-
#initialize(opts = {}) ⇒ Scraper
constructor
opts
is a hash of options, with the following keys: :neighborhood-
dorchester, back-bay, etc.
-
#special_find(array) ⇒ Object
Pass in an array of names that =~ (case-insensitive) the ones you’re thinking of, and this will get those.
Constructor Details
#initialize(opts = {}) ⇒ Scraper
opts
is a hash of options, with the following keys:
- :neighborhood
-
dorchester, back-bay, etc. (default: “all”)
- :meal
-
lunch, dinner, both, or any (default: “any”). Will create a
file in your home directory called “.restaurant_week_boston.cache” which contains the HTML from the RWB site, just so it doesn’t have to keep getting it. You can delete that file, it’ll just take longer next time since it will have to re-get the HTML.
17 18 19 20 21 22 |
# File 'lib/restaurant_week_boston/scraper.rb', line 17 def initialize(opts = {}) @url = create_url(opts) @dump = File.('~/.restaurant_week_boston.cache') entries = doc().css('.restaurantEntry') @restaurants = entries.map{ |entry| Restaurant.new(entry) } end |
Instance Method Details
#create_url(opts = {}) ⇒ Object
opts
is a hash of options, with the following keys:
- :neighborhood
-
dorchester, back-bay, etc. (default: :all)
- :meal
-
lunch, dinner, both, or any (default: :any)
27 28 29 30 31 32 33 34 35 |
# File 'lib/restaurant_week_boston/scraper.rb', line 27 def create_url(opts = {}) # meal: any/lunch/dinner/both # &view=all default_opts = {:neighborhood => :all, :meal => :any } opts = default_opts.merge!(opts) sprintf('http://www.restaurantweekboston.com/?neighborhood=%s&meal=%s&view=all', opts[:neighborhood].to_s, opts[:meal].to_s) end |
#doc ⇒ Object
Return a Nokogiri::HTML::Document parsed from get_html. Prints status messages along the way.
60 61 62 63 64 65 66 67 68 |
# File 'lib/restaurant_week_boston/scraper.rb', line 60 def doc # get_html beforehand for good output messages html = get_html print "Parsing doc..." doc = Nokogiri::HTML(html) puts "done." puts doc end |
#each(&blk) ⇒ Object
Iterates over @restaurants. All methods in Enumerable work.
39 40 41 |
# File 'lib/restaurant_week_boston/scraper.rb', line 39 def each(&blk) @restaurants.each(&blk) end |
#get_html ⇒ Object
Returns the result of open()ing the url from create_url(), as a String.
44 45 46 47 48 49 50 51 52 53 54 55 56 |
# File 'lib/restaurant_week_boston/scraper.rb', line 44 def get_html print "Getting doc..." if File.size? @dump html = File.read(@dump) else html = open(@url).read() f = File.new(@dump, 'w') f.write(html) f.close() end puts "done." html end |
#special_find(array) ⇒ Object
Pass in an array of names that =~ (case-insensitive) the ones you’re thinking of, and this will get those. So, if you’re thinking of Bond, 224 Boston Street, and Artu, you can pass in [‘bond’, ‘boston street’, ‘artu’].
74 75 76 77 78 79 80 81 82 83 |
# File 'lib/restaurant_week_boston/scraper.rb', line 74 def special_find(array) array.map! do |name| /#{Regexp.escape(name)}/i end results = @restaurants.find_all do |restaurant| array.detect{ |regexp| regexp =~ restaurant.name } end # Add separators results.join("\n" + '-' * 80 + "\n") end |