stackconnect

Ruby API for making StackOverflow queries. This is a limited number of the API calls that can be made to the StackOverflow API, mainly dealing with tags, questions, and users.

Software Versions

This has been tested and written for Rails 4.0 and above, and Ruby 2.0.0 and above.

Installation

In your Gemfile, add this line:

gem 'stackconnect'

And then execute:

$ bundle

Or install it yourself as:

$ gem install stackconnect

If you want the most current version of the gem, in your Gemfile, add this:

gem 'stackconnect', :git => 'git://github.com/stackgems/stackconnect.git'

Usage

These methods are quota restricted. The Stack Exchange API only allows a quota of 10,000 without an API key: http://api.stackexchange.com/docs/authentication. Once you get over that quota without a key, your IP address will be throttled from making any more requests until the next day.

If you get an API key, your limit will still be 10,000 (per application), with a limit of 5 applications.

Retrieve Total Number of Questions

 sc = StackConnect.new
 total = sc.retrieve_total_questions(from_date)

This will return a Ruby Fixnum.

fromdate has to be in Unix date format. So for example, to retrieve the total number of questions from yesterday:

 date = Date.today.to_time.to_i
 total = sc.retrieve_total_questions(from_date)

Retrieve Most Popular Tags of All Time

The default sort is popular, and the order is desc in order to list the most popular tags first.

data = sc.retrieve_most_popular_tags

This will return a JSON data object which you can then parse. An example:

data["items"].each do |trend|
   puts trend["name"]
end

The data looks like this:

{
  "items": [
    {
      "has_synonyms": false,
      "is_moderator_only": false,
      "is_required": false,
      "count": 127787,
      "name": "c"
    },
    {
      "has_synonyms": true,
      "is_moderator_only": false,
      "is_required": false,
      "count": 268773,
      "name": "python"
    },
    {
      "has_synonyms": true,
      "is_moderator_only": false,
      "is_required": false,
      "count": 18729,
      "name": "variables"
    },
  ],
  "has_more": true,
  "backoff": 10,
  "quota_max": 10000,
  "quota_remaining": 9973
}

Retrieving the Most Popular Tags from a Particular Date

The way StackOverflow handles date ranges for tags is not what you would expect: the count returned with each tag is actually the all-time count for that tag, not the count of the tag for that particular day, or date range. If you want to get the total number of tags for a particular day, you would have to parse the /questions API call return data, and iterate through all the questions for that day.

When you give a date range for the /tags call, the 'todate' parameter is ignored, and will give the same response if it wasn't included at all. However, querying StackOverflow with a 'fromdate' and sorting on 'popular' will return the most popular tags for that particular time. Therefore, given the way the data is returned, retrieve_tags(from_date) already sorts on popular by default, otherwise, the data isn't very meaningful.

The date range is also restricted to be after January 1st, 2009. If a date before this time is given inside the query, an exception will be thrown.

If the API call is given any other sort type, such as 'popular' or 'name', and the fromdate is before January 1st, 2009, StackOverflow will return the most popular tags of all time. If the sort type is 'activity', no data will be returned.

The order is also 'desc' (descending) by default, so that the most popular tags are shown first.

data = StackConnect.retrieve_tags(from_date)

Sample Data:

{
  "items": [
    {
      "has_synonyms": false,
      "is_moderator_only": false,
      "is_required": false,
      "count": 1,
      "name": "microsoft-net-http"
    },
    {
      "has_synonyms": false,
      "is_moderator_only": false,
      "is_required": false,
      "count": 1,
      "name": "mlt"
    },
  ],
  "has_more": true,
  "quota_max": 10000,
  "quota_remaining": 9972
}

Retrieving User Data

This call will retrieve all time user data, since the creation of the site. When making this API call, check for the 'has_more' attribute to see if there are more pages. If 'has_more' returns true, then increase the 'page' parameter before calling the function again to retrieve the next page of data.

The query is called with a sort on 'reputation', and ordered in descending order. Therefore, the users with the highest reputation will appear first.

  sc = StackConnect.new
  data = sc.retrieve_users(1)

Sample data:

{
  "items": [
    {
      "badge_counts": {
        "bronze": 5498,
        "silver": 4081,
        "gold": 261
      },
      "account_id": 11683,
      "is_employee": false,
      "last_modified_date": 1394538684,
      "last_access_date": 1394629373,
      "reputation_change_year": 21465,
      "reputation_change_quarter": 21465,
      "reputation_change_month": 3600,
      "reputation_change_week": 1085,
      "reputation_change_day": 215,
      "reputation": 656402,
      "creation_date": 1222430705,
      "user_type": "registered",
      "user_id": 22656,
      "age": 37,
      "accept_rate": 83,
      "location": "Reading, United Kingdom",
      "website_url": "http://csharpindepth.com",
      "link": "http://stackoverflow.com/users/22656/jon-skeet",
      "display_name": "Jon Skeet",
      "profile_image": "https://www.gravatar.com/avatar/6d8ebb117e8d83d74ea95fbdd0f87e13?s=128&d=identicon&r=PG"
    }
  ],
  "has_more": true,
  "quota_max": 10000,
  "quota_remaining": 9990
}

The sample data here only has one item, normally, you'd get the first 30 results. In order to check if there is more data:

has_more = data["has_more"]

If 'has_more' returns true, then increase the page count, and call again:

page = 1
begin
  data = sc.retrieve_users(page)

  # work with data
  ..

  has_more = data["has_more"]
  page += 1
  sleep(1)
end while has_more == true

Make sure that in before making the call, you call sleep() for at least one second, otherwise, the StackExchange API will throttle your requests and require you to wait a certain amount of time before making another request. Usually increasing this time to about 10 seconds will guarantee your calls won't get throttled for the remainder of the pages.

Contributing

Fork it ( http://github.com/[my-github-username]/stack-connect/fork )
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create new Pull Request