Class: Ruboty::YMCrawl::Crawler
- Inherits:
-
Object
- Object
- Ruboty::YMCrawl::Crawler
- Defined in:
- lib/ruboty/ymcrawl/crawler.rb
Overview
画像のスクレイピングを行うクラス
Constant Summary collapse
- INDEX_STR =
jsonファイルでINDEX番号が入る場所を表す文字列
"{index}"
Instance Method Summary collapse
-
#initialize(dir, site_data, wait_time) ⇒ Crawler
constructor
A new instance of Crawler.
-
#save_images(original_url) ⇒ Object
与えられたcssセレクタから画像を抽出する.
Constructor Details
#initialize(dir, site_data, wait_time) ⇒ Crawler
118 119 120 121 122 123 124 125 126 127 128 |
# File 'lib/ruboty/ymcrawl/crawler.rb', line 118 def initialize(dir, site_data, wait_time) HostManager.instance.set_wait_time(wait_time) @selectors = {} @selectors[:image] = site_data["css"]["image"].map { |s| Selector.new(s) } @selectors[:image_title] = site_data["css"]["image_title"].map { |s| Selector.new(s) } @selectors[:title] = site_data["css"]["title"].map { |s| Selector.new(s) } @selectors[:page_index_max] = site_data["css"]["page_index_max"].map { |s| Selector.new(s) } @page_index_min = site_data["page_index_min"] @next_page_appendix = (site_data["next_page_appendix"] == nil) ? "" : site_data["next_page_appendix"] @dir = dir end |
Instance Method Details
#save_images(original_url) ⇒ Object
与えられたcssセレクタから画像を抽出する
131 132 133 134 135 136 137 138 139 140 |
# File 'lib/ruboty/ymcrawl/crawler.rb', line 131 def save_images(original_url) dst_dir = "#{@dir}/#{get_contents(original_url, :title).first}" (@page_index_min..get_page_index_max(original_url) ).each do |page_index| url = "#{original_url}#{get_next_page_appendix_with_index(page_index)}" get_contents(url, :image).zip(get_contents(url, :image_title)) do |url, title| save_image(dst_dir, url, title) unless url == nil end end dst_dir end |