Class: Osprey::Search

Inherits:
Object
  • Object
show all
Defined in:
lib/osprey/search.rb

Overview

Primary interface to the Osprey Twitter search library.

Constant Summary collapse

DEFAULTS =
{
  :backend => {
    :moneta_klass   => 'Moneta::Memory',
  },
  :rpp => 50,
  :preserved_tweet_ids => 10_000
}

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(term, options = { }) ⇒ Search

Initializes the Osprey::Search client.

Usage (Basic):

search = Osprey::Search.new(term, options)
results = search.fetch

Usage (Custom Backend):

o = Osprey::Search.new('Swine Flu', { :backend => { :moneta_klass => 'Moneta::Memcache', :server => 'localhost:1978' } })
results = o.fetch

Options:

The Osprey::Search library supports options for customizing the parameters that will be passed to Twitter as well as to the local backend for keeping track of state.

  • backend - A hash of options which are passed directly to the Moneta class used as key-value store. The value for this option should be another Hash containing configuration options for the backend. Keys supported by the backend hash:

    • moneta_backend - The backend used for storing the serialized representation of the last fetch of this query. Uses Moneta, a unified interface to key-value storage systems. This value may be a String, in which case the appropriate Moneta library will be automatically required, or a class constant. If a class constant is given, it is up to the calling user to require the appropriate Moneta library before referring to the constant in the initialization of Osprey::Search.

    • other_options - Any other options given to the backend hash will be passed directly to the Moneta class used as key-value store for configuration. Please see the documentation for the appropriate Moneta class for options supported.

  • preserved_tweet_ids - The number of Tweet ids that will be preserved. This ensures that we’re able to detect if running different queries returns the same tweet in order to not mark that tweet as a new record. Choosing a higher number here will mean some additional processing time for each new tweet, and a small amount of increased storage. The default should be a decent compromise between performance needs while still not seeing duplicate tweets show up as new records.

  • rpp - Requests per page. Determines the results to fetch from Twitter. Defaults to 50.

  • since_id - Tells twitter to only give us results with an ID greater than the given ID. Supplied by default in URL string if previous results were found.



56
57
58
59
60
# File 'lib/osprey/search.rb', line 56

def initialize(term, options = { })
  @term    = term
  @options = options.reverse_merge(DEFAULTS)
  @backend = initialize_backend(@options[:backend][:moneta_klass], @options[:backend].except(:moneta_klass))
end

Instance Attribute Details

#termObject (readonly)

Returns the value of attribute term.



5
6
7
# File 'lib/osprey/search.rb', line 5

def term
  @term
end

Instance Method Details

#fetchObject

Returns a Osprey::ResultSet object containing Osprey::Tweet objects for the tweets found for query.



63
64
65
66
67
68
69
70
71
72
73
# File 'lib/osprey/search.rb', line 63

def fetch
  p_results = previous_results
  
  res = Curl::Easy.perform(url(p_results))
        
  if res.response_code == 200
    parse_tweets(res, p_results)
  else
    $stderr.puts "Received invalid twitter response code of #{res.response_code}."
  end
end