Class: RDig::UrlFilters::VisitedUrlFilter
- Inherits:
-
Object
- Object
- RDig::UrlFilters::VisitedUrlFilter
- Includes:
- MonitorMixin, Singleton
- Defined in:
- lib/rdig/url_filters.rb
Overview
takes care of a list of all Urls visited during a crawl, to avoid indexing pages more than once implemented as a thread safe singleton as it has to be shared between all crawler threads
Instance Method Summary collapse
-
#apply(document) ⇒ Object
return document if this document’s url has not been visited yet, nil otherwise.
-
#initialize ⇒ VisitedUrlFilter
constructor
A new instance of VisitedUrlFilter.
Constructor Details
#initialize ⇒ VisitedUrlFilter
Returns a new instance of VisitedUrlFilter.
69 70 71 72 |
# File 'lib/rdig/url_filters.rb', line 69 def initialize @visited_urls = Set.new super end |
Instance Method Details
#apply(document) ⇒ Object
return document if this document’s url has not been visited yet, nil otherwise
76 77 78 79 80 |
# File 'lib/rdig/url_filters.rb', line 76 def apply(document) synchronize do @visited_urls.add?(document.uri.to_s) ? document : nil end end |