Class: NextUrlsInSQS
- Inherits:
-
Object
- Object
- NextUrlsInSQS
- Defined in:
- lib/spider/next_urls_in_sqs.rb
Overview
A specialized class using AmazonSQS to track nodes to walk. It supports two operations: push and pop . Together these can be used to add items to the queue, then pull items off the queue.
This is useful if you want multiple Spider processes crawling the same data set.
To use it with Spider use the store_next_urls_with method:
Spider.start_at('http://example.com/') do |s|
s.store_next_urls_with NextUrlsInSQS.new(AWS_ACCESS_KEY, AWS_SECRET_ACCESS_KEY, queue_name)
end
Instance Method Summary collapse
-
#initialize(aws_access_key, aws_secret_access_key, queue_name = 'ruby-spider') ⇒ NextUrlsInSQS
constructor
Construct a new NextUrlsInSQS instance.
-
#pop ⇒ Object
Pull an item off the queue, loop until data is found.
-
#push(a_msg) ⇒ Object
Put data on the queue.
Constructor Details
#initialize(aws_access_key, aws_secret_access_key, queue_name = 'ruby-spider') ⇒ NextUrlsInSQS
Construct a new NextUrlsInSQS instance. All arguments here are passed to RightAWS::SqsGen2 (part of the right_aws gem) or used to set the AmazonSQS queue name (optional).
23 24 25 26 |
# File 'lib/spider/next_urls_in_sqs.rb', line 23 def initialize(aws_access_key, aws_secret_access_key, queue_name = 'ruby-spider') @sqs = RightAws::SqsGen2.new(aws_access_key, aws_secret_access_key) @queue = @sqs.queue(queue_name) end |
Instance Method Details
#pop ⇒ Object
Pull an item off the queue, loop until data is found. Data is encoded with YAML.
30 31 32 33 34 35 36 |
# File 'lib/spider/next_urls_in_sqs.rb', line 30 def pop while true = @queue.pop return YAML::load(.to_s) unless .nil? sleep 5 end end |
#push(a_msg) ⇒ Object
Put data on the queue. Data is encoded with YAML.
39 40 41 42 |
# File 'lib/spider/next_urls_in_sqs.rb', line 39 def push(a_msg) = YAML::dump(a_msg) @queue.push(a_msg) end |