Class: Grell::PageCollection

Inherits:
Object
  • Object
show all
Defined in:
lib/grell/page_collection.rb

Overview

Keeps a record of all the pages crawled. When a new url is found it is added to this collection, which makes sure it is unique. This page is part of the discovered pages. Eventually that page will be navigated to, then the page will be part of the visited pages.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(add_match_block) ⇒ PageCollection

A block containing the logic that determines if a new URL should be added to the collection or if it is already present will be passed to the initializer.



11
12
13
14
# File 'lib/grell/page_collection.rb', line 11

def initialize(add_match_block)
  @collection = []
  @add_match_block = add_match_block || default_add_match
end

Instance Attribute Details

#collectionObject (readonly)

Returns the value of attribute collection.



7
8
9
# File 'lib/grell/page_collection.rb', line 7

def collection
  @collection
end

Instance Method Details

#create_page(url, parent_id) ⇒ Object



16
17
18
19
20
21
# File 'lib/grell/page_collection.rb', line 16

def create_page(url, parent_id)
  page_id = next_id
  page = Page.new(url, page_id, parent_id)
  add(page)
  page
end

#discovered_pagesObject



27
28
29
# File 'lib/grell/page_collection.rb', line 27

def discovered_pages
  @collection - visited_pages
end

#next_pageObject



31
32
33
# File 'lib/grell/page_collection.rb', line 31

def next_page
  discovered_pages.sort_by{|page| page.parent_id}.first
end

#visited_pagesObject



23
24
25
# File 'lib/grell/page_collection.rb', line 23

def visited_pages
  @collection.select {|page| page.visited?}
end