Module: NewsCrawler::CrawlerModule

Included in:
LinkSelector::SameDomainSelector
Defined in:
lib/news_crawler/crawler_module.rb

Overview

Include this to get basic module methods

Instance Method Summary collapse

Instance Method Details

#find_all(state, max_depth = -1)) ⇒ Array

Find one visited url with given current module process state

Parameters:

  • state (String)

    one of unprocessed, processing, processed

  • max_depth (Fixnum) (defaults to: -1))

    max url depth return (inclusive)

Returns:

  • (Array)

    URL list



51
52
53
# File 'lib/news_crawler/crawler_module.rb', line 51

def find_all(state, max_depth = -1)
  URLQueue.find_all(self.class.name, state, max_depth)
end

#find_one(state, max_depth = -1)) ⇒ String?

Find all visited urls with current module’s state

Parameters:

  • state (String)
  • max_depth (Fixnum) (defaults to: -1))

    max url depth return (inclusive)

Returns:

  • (String, nil)

    URL or nil if url doesn’t exists



59
60
61
# File 'lib/news_crawler/crawler_module.rb', line 59

def find_one(state, max_depth = -1)
  URLQueue.find_one(self.class.name, state, max_depth)
end

#find_unprocessed(max_depth = -1)) ⇒ Array

Find all visited unprocessed url

Parameters:

  • max_depth (Fixnum) (defaults to: -1))

    max url depth return (inclusive)

Returns:

  • (Array)

    URL list



43
44
45
# File 'lib/news_crawler/crawler_module.rb', line 43

def find_unprocessed(max_depth = -1)
  URLQueue.find_all(self.class.name, URLQueue::UNPROCESSED, max_depth)
end

#mark_processed(url) ⇒ Object

Mark current url process state of current module is processed

Parameters:

  • url (String)


30
31
32
# File 'lib/news_crawler/crawler_module.rb', line 30

def mark_processed(url)
  URLQueue.mark(self.class.name, url, URLQueue::PROCESSED)
end

#mark_unprocessed(url) ⇒ Object

Mark current url process state of current module is unprocessed

Parameters:

  • url (String)


36
37
38
# File 'lib/news_crawler/crawler_module.rb', line 36

def mark_unprocessed(url)
  URLQueue.mark(self.class.name, url, URLQueue::UNPROCESSED)
end

#next_unprocessed(max_depth = -1)) ⇒ String?

Get next unprocessed a url and mark it as processing in atomic

Parameters:

  • max_depth (Fixnum) (defaults to: -1))

    max url depth return (inclusive)

Returns:

  • (String, nil)

    URL or nil if url doesn’t exists



66
67
68
# File 'lib/news_crawler/crawler_module.rb', line 66

def next_unprocessed(max_depth = -1)
  URLQueue.next_unprocessed(self.class.name, max_depth)
end