Class: RDig::UrlFilters::VisitedUrlFilter

Inherits:
Object
  • Object
show all
Includes:
MonitorMixin, Singleton
Defined in:
lib/rdig/url_filters.rb

Overview

takes care of a list of all Urls visited during a crawl, to avoid indexing pages more than once implemented as a thread safe singleton as it has to be shared between all crawler threads

Instance Method Summary collapse

Constructor Details

#initializeVisitedUrlFilter

Returns a new instance of VisitedUrlFilter.



69
70
71
72
# File 'lib/rdig/url_filters.rb', line 69

def initialize
  @visited_urls = Set.new
  super
end

Instance Method Details

#apply(document) ⇒ Object

return document if this document’s url has not been visited yet, nil otherwise



76
77
78
79
80
# File 'lib/rdig/url_filters.rb', line 76

def apply(document)
  synchronize do
    @visited_urls.add?(document.uri.to_s) ? document : nil 
  end
end