Class: Scruber::QueueAdapters::Memory

Inherits:
AbstractAdapter show all
Defined in:
lib/scruber/queue_adapters/memory.rb

Overview

Memory Queue Adapter

Simple queue adapted which stores pages in memory. Nice solution for small scrapes. Easy to use. No need to setup any database, but no ability to reparse pages if something went wrong.

Author:

  • Ivan Goncharov

Defined Under Namespace

Classes: Page

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(options = {}) ⇒ Scruber::QueueAdapters::Memory

Queue initializer

Parameters:

  • options (Hash) (defaults to: {})

    See AbstractAdapter#initializer



58
59
60
61
62
63
64
# File 'lib/scruber/queue_adapters/memory.rb', line 58

def initialize(options={})
  super(options)
  @processed_ids = []
  @queue = []
  @downloaded_pages = []
  @error_pages = []
end

Instance Attribute Details

#error_pagesObject (readonly)

Returns the value of attribute error_pages.



14
15
16
# File 'lib/scruber/queue_adapters/memory.rb', line 14

def error_pages
  @error_pages
end

Instance Method Details

#add(url_or_page, options = {}) ⇒ void Also known as: push

This method returns an undefined value.

Add page to queue

Parameters:

  • url_or_page (String|Page)

    URL of page or Page object

  • options (Hash) (defaults to: {})

    Other options, see AbstractAdapter::Page



72
73
74
75
76
77
# File 'lib/scruber/queue_adapters/memory.rb', line 72

def add(url_or_page, options={})
  unless url_or_page.is_a?(Page)
    url_or_page = Page.new(self, options.merge(url: url_or_page))
  end
  @queue.push(url_or_page) unless @processed_ids.include?(url_or_page.id) || find(url_or_page.id)
end

#add_downloaded(page) ⇒ void

This method returns an undefined value.

Internal method to add page to downloaded queue

Parameters:



156
157
158
# File 'lib/scruber/queue_adapters/memory.rb', line 156

def add_downloaded(page)
  @downloaded_pages.push page
end

#add_error_page(page) ⇒ void

This method returns an undefined value.

Internal method to add page to error queue

Parameters:



166
167
168
# File 'lib/scruber/queue_adapters/memory.rb', line 166

def add_error_page(page)
  @error_pages.push page
end

#add_processed_page(page) ⇒ void

This method returns an undefined value.

Saving processed page id to prevent adding identical pages to queue

Parameters:

  • page (Page)

    page



177
178
179
# File 'lib/scruber/queue_adapters/memory.rb', line 177

def add_processed_page(page)
  @processed_ids.push page.id
end

#delete(page) ⇒ void

This method returns an undefined value.

Delete page from all internal queues

Parameters:



196
197
198
199
200
# File 'lib/scruber/queue_adapters/memory.rb', line 196

def delete(page)
  @queue -= [page]
  @downloaded_pages -= [page]
  @error_pages -= [page]
end

#downloaded_countInteger

Count of downloaded pages Using to show downloading progress.

Returns:

  • (Integer)

    count of downloaded pages



107
108
109
# File 'lib/scruber/queue_adapters/memory.rb', line 107

def downloaded_count
  @downloaded_pages.count
end

#fetch_downloaded(count = nil) ⇒ Scruber::QueueAdapters::AbstractAdapter::Page|Array<Scruber::QueueAdapters::AbstractAdapter::Page>

Fetch downloaded and not processed pages for feching

Parameters:

  • count (Integer) (defaults to: nil)

    count of pages to fetch

Returns:



116
117
118
119
120
121
122
# File 'lib/scruber/queue_adapters/memory.rb', line 116

def fetch_downloaded(count=nil)
  if count.nil?
    @downloaded_pages.shift
  else
    @downloaded_pages.shift(count)
  end
end

#fetch_error(count = nil) ⇒ Scruber::QueueAdapters::AbstractAdapter::Page|Array<Scruber::QueueAdapters::AbstractAdapter::Page>

Fetch error page

Parameters:

  • count (Integer) (defaults to: nil)

    count of pages to fetch

Returns:



129
130
131
132
133
134
135
# File 'lib/scruber/queue_adapters/memory.rb', line 129

def fetch_error(count=nil)
  if count.nil?
    @error_pages.shift
  else
    @error_pages.shift(count)
  end
end

#fetch_pending(count = nil) ⇒ Scruber::QueueAdapters::AbstractAdapter::Page|Array<Scruber::QueueAdapters::AbstractAdapter::Page>

Fetch pending page for fetching

Parameters:

  • count (Integer) (defaults to: nil)

    count of pages to fetch

Returns:



142
143
144
145
146
147
148
# File 'lib/scruber/queue_adapters/memory.rb', line 142

def fetch_pending(count=nil)
  if count.nil?
    @queue.shift
  else
    @queue.shift(count)
  end
end

#find(id) ⇒ Page

Search page by id

Parameters:

Returns:



85
86
87
88
89
90
91
92
# File 'lib/scruber/queue_adapters/memory.rb', line 85

def find(id)
  [@queue, @downloaded_pages, @error_pages].each do |q|
    q.each do |i|
      return i if i.id == id
    end
  end
  nil
end

#has_work?Boolean

Used by Core. It checks for pages that are not downloaded or not parsed yet.

Returns:

  • (Boolean)

    true if queue still has work for scraper



186
187
188
# File 'lib/scruber/queue_adapters/memory.rb', line 186

def has_work?
  @queue.count > 0 || @downloaded_pages.count > 0
end

#initialized?Boolean

Check if queue was initialized. Using for ‘seed` method. If queue was initialized, then no need to run seed block.

Returns:

  • (Boolean)

    true if queue already was initialized



208
209
210
# File 'lib/scruber/queue_adapters/memory.rb', line 208

def initialized?
  @queue.present? || @downloaded_pages.present? || @error_pages.present?
end

#sizeInteger

Size of queue

Returns:

  • (Integer)

    count of pages in queue



98
99
100
# File 'lib/scruber/queue_adapters/memory.rb', line 98

def size
  @queue.count
end