Method: Scruber::Core::Crawler#on_page_error

Defined in:
lib/scruber/core/crawler.rb

#on_page_error(&block) ⇒ void

This method returns an undefined value.

Register callback which will be executed for error pages, like 404 or 500 Attention! You should call one of these methods for page to prevent infinite loop: page.processed!, page.delete, page.redownload!(0)

Examples:

Processing error page

on_page_error do |page|
  if page.response_body =~ /distil/
    page.page.redownload!(0)
  elsif page.response_code == /404/
    get page.at('a.moved_to').attr('href')
    page.processed!
  else
    page.delete
  end
end

Parameters:

  • block (Proc)

    body of callback



185
186
187
# File 'lib/scruber/core/crawler.rb', line 185

def on_page_error(&block)
  @on_page_error_callback = block
end