Class: ScraperWiki::API

Inherits:

Object

Object
ScraperWiki::API

show all

Includes:: HTTParty

Defined in:: lib/scraperwiki-api.rb,
lib/scraperwiki-api/version.rb,
lib/scraperwiki-api/matchers.rb

Overview

A Ruby wrapper for the ScraperWiki API.

Defined Under Namespace

Modules: Matchers

Constant Summary collapse

RUN_INTERVALS =

{
  :never   => -1,
  :monthly => 2678400,
  :weekly  => 604800,
  :daily   => 86400,
  :hourly  => 3600,
}

VERSION =

"0.0.7"

Class Method Summary collapse

.edit_scraper_url(shortname) ⇒ String
Returns the URL to edit the scraper.
.scraper_url(shortname) ⇒ String
Returns the URL to the scraper's overview.

Instance Method Summary collapse

#datastore_sqlite(shortname, query, opts = {}) ⇒ Array, ...
Queries and extracts data via a general purpose SQL interface.
#initialize(apikey = nil) ⇒ API constructor
Initializes a ScraperWiki API object.
#scraper_getinfo(shortname, opts = {}) ⇒ Array
Extracts data about a scraper's code, owner, history, etc.
#scraper_getruninfo(shortname, opts = {}) ⇒ Array
See what the scraper did during each run.
#scraper_getuserinfo(username) ⇒ Array
Find out information about a user.
#scraper_search(opts = {}) ⇒ Array
Search the titles and descriptions of all the scrapers.
#scraper_usersearch(opts = {}) ⇒ Array
Search for a user by name.

Constructor Details

#initialize(apikey = nil) ⇒ `API`

Initializes a ScraperWiki API object.



37
38
39

# File 'lib/scraperwiki-api.rb', line 37

def initialize(apikey = nil)
  @apikey = apikey
end

Class Method Details

.edit_scraper_url(shortname) ⇒ `String`

Returns the URL to edit the scraper.

Parameters:

shortname (String) —
the scraper's shortname

Returns:

(String) —
the URL to edit the scraper



31
32
33

# File 'lib/scraperwiki-api.rb', line 31

def edit_scraper_url(shortname)
  "https://scraperwiki.com/scrapers/#{shortname}/edit/"
end

.scraper_url(shortname) ⇒ `String`

Returns the URL to the scraper's overview.

Parameters:

shortname (String) —
the scraper's shortname

Returns:

(String) —
the URL to the scraper's overview



23
24
25

# File 'lib/scraperwiki-api.rb', line 23

def scraper_url(shortname)
  "https://scraperwiki.com/scrapers/#{shortname}/"
end

Instance Method Details

#datastore_sqlite(shortname, query, opts = {}) ⇒ `Array`, ...

Note:

The query string parameter is +name+, not +shortname+ as in the ScraperWiki docs

Queries and extracts data via a general purpose SQL interface.

To make an RSS feed you need to use SQL's +AS+ keyword (e.g. "SELECT name AS description") to make columns called +title+, +link+, +description+, +guid+ (optional, uses link if not available) and +pubDate+ or +date+.

+jsondict+ example output:

[
  {
    "fieldA": "valueA",
    "fieldB": "valueB",
    "fieldC": "valueC",
  },
  ...
]

+jsonlist+ example output:

{
  "keys": ["fieldA", "fieldB", "fieldC"],
  "data": [
    ["valueA", "valueB", "valueC"],
    ...
  ]
}

+csv+ example output:

fieldA,fieldB,fieldC
valueA,valueB,valueC
...

Parameters:

shortname (String) —
the scraper's shortname (as it appears in the URL)
query (String) —
a SQL query
opts (Hash) (defaults to: {}) —
optional arguments

Options Hash (opts):

:format (String) —
one of "jsondict", "jsonlist", "csv", "htmltable" or "rss2"
:attach (Array, String) —
";"-delimited list of shortnames of other scrapers whose data you need to access

Returns:

(Array, Hash, String)

#scraper_getinfo(shortname, opts = {}) ⇒ `Array`

Note:

Returns an array although the array seems to always have only one item

Note:

The +tags+ field seems to always be an empty array

Note:

Fields like +last_run+ seem to follow British Summer Time.

Note:

The query string parameter is +name+, not +shortname+ as in the ScraperWiki docs

Extracts data about a scraper's code, owner, history, etc.

+runid+ is a Unix timestamp with microseconds and a UUID.
The value of +records+ is the same as that of +total_rows+ under +datasummary+.
+run_interval+ is the number of seconds between runs. It is one of:
- -1 (never)
- 2678400 (monthly)
- 604800 (weekly)
- 86400 (daily)
- 3600 (hourly)
+privacy_status+ is one of:
- "public" (everyone can see and edit the scraper and its data)
- "visible" (everyone can see the scraper, but only contributors can edit it)
- "private" (only contributors can see and edit the scraper and its data)
An individual +runevents+ hash will have an +exception_message+ key if there was an error during that run.

Example output:

[
  {
    "code": "require 'nokogiri'\n...",
    "datasummary": {
      "tables": {
        "swdata": {
          "keys": [
            "fieldA",
            ...
          ],
          "count": 42,
          "sql": "CREATE TABLE `swdata` (...)"
        },
        "swvariables": {
          "keys": [
            "value_blob",
            "type",
            "name"
          ],
          "count": 2,
          "sql": "CREATE TABLE `swvariables` (`value_blob` blob, `type` text, `name` text)"
        },
        ...
      },
      "total_rows": 44,
      "filesize": 1000000
    },
    "description": "Scrapes websites for data.",
    "language": "ruby",
    "title": "Example scraper",
    "tags": [],
    "short_name": "example-scraper",
    "userroles": {
      "owner": [
        "johndoe"
      ],
      "editor": [
        "janedoe",
        ...
      ]
    },
    "last_run": "1970-01-01T00:00:00",
    "created": "1970-01-01T00:00:00",
    "runevents": [
      {
        "still_running": false,
        "pages_scraped": 5,
        "run_started": "1970-01-01T00:00:00",
        "last_update": "1970-01-01T00:00:00",
        "runid": "1325394000.000000_xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx",
        "records_produced": 42
      },
      ...
    ],
    "records": 44,
    "wiki_type": "scraper",
    "privacy_status": "visible",
    "run_interval": 604800,
    "attachable_here": [],
    "attachables": [],
    "history": [
      ...,
      {
        "date": "1970-01-01T00:00:00",
        "version": 0,
        "user": "johndoe",
        "session": "Thu, 1 Jan 1970 00:00:08 GMT"
      }
    ]
  }
]

Parameters:

shortname (String) —
the scraper's shortname (as it appears in the URL)
opts (Hash) (defaults to: {}) —
optional arguments

Options Hash (opts):

:version (String) —
version number (-1 for most recent) [default -1]
:history_start_date (String) —
history and runevents are restricted to this date or after, enter as YYYY-MM-DD
:quietfields (Array, String) —
"|"-delimited list of fields to exclude from the output. Must be a subset of 'code|runevents|datasummary|userroles|history'

Returns:

(Array)

# File 'lib/scraperwiki-api.rb', line 198

def scraper_getinfo(shortname, opts = {})
  if Array === opts[:quietfields]
    opts[:quietfields] = opts[:quietfields].join '|'
  end
  request_with_apikey '/scraper/getinfo', {:name => shortname}.merge(opts)
end

#scraper_getruninfo(shortname, opts = {}) ⇒ `Array`

Note:

Returns an array although the array seems to always have only one item

Note:

The query string parameter is +name+, not +shortname+ as in the ScraperWiki docs

See what the scraper did during each run.

Example output:

[
  {
    "run_ended": "1970-01-01T00:00:00",
    "first_url_scraped": "http://www.iana.org/domains/example/",
    "pages_scraped": 5,
    "run_started": "1970-01-01T00:00:00",
    "runid": "1325394000.000000_xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx",
    "domainsscraped": [
      {
        "domain": "http://example.com",
        "bytes": 1000000,
        "pages": 5
      }
      ...
    ],
    "output": "...",
    "records_produced": 42
  }
]

Parameters:

shortname (String) —
the scraper's shortname (as it appears in the URL)
opts (Hash) (defaults to: {}) —
optional arguments

Options Hash (opts):

runid (String) —
a run ID

Returns:

(Array)



237
238
239

# File 'lib/scraperwiki-api.rb', line 237

def scraper_getruninfo(shortname, opts = {})
  request_with_apikey '/scraper/getruninfo', {:name => shortname}.merge(opts)
end

#scraper_getuserinfo(username) ⇒ `Array`

Note:

Returns an array although the array seems to always have only one item

Note:

The date joined field is +date_joined+ (with underscore) on #scraper_usersearch

Find out information about a user.

Example output:

[
  {
    "username": "johndoe",
    "profilename": "John Doe",
    "coderoles": {
      "owner": [
        "johndoe.emailer",
        "example-scraper",
        ...
      ],
      "email": [
        "johndoe.emailer"
      ],
      "editor": [
        "yet-another-scraper",
        ...
      ]
    },
    "datejoined": "1970-01-01T00:00:00"
  }
]

Parameters:

username (String) —
a username

Returns:

(Array)



273
274
275

# File 'lib/scraperwiki-api.rb', line 273

def scraper_getuserinfo(username)
  request_with_apikey '/scraper/getuserinfo', :username => username
end

#scraper_search(opts = {}) ⇒ `Array`

Search the titles and descriptions of all the scrapers.

Example output:

[
  {
    "description": "Scrapes websites for data.",
    "language": "ruby",
    "created": "1970-01-01T00:00:00",
    "title": "Example scraper",
    "short_name": "example-scraper",
    "privacy_status": "public"
  },
  ...
]

Parameters:

opts (Hash) (defaults to: {}) —
optional arguments

Options Hash (opts):

:searchquery (String) —
search terms
:maxrows (Integer) —
number of results to return [default 5]
:requestinguser (String) —
the name of the user making the search, which changes the order of the matches

Returns:

(Array)



299
300
301

# File 'lib/scraperwiki-api.rb', line 299

def scraper_search(opts = {})
  request_with_apikey '/scraper/search', opts
end

#scraper_usersearch(opts = {}) ⇒ `Array`

Note:

The date joined field is +datejoined+ (without underscore) on #scraper_getuserinfo

Search for a user by name.

Example output:

[
  {
    "username": "johndoe",
    "profilename": "John Doe",
    "date_joined": "1970-01-01T00:00:00"
  },
  ...
]

Parameters:

opts (Hash) (defaults to: {}) —
optional arguments

Options Hash (opts):

:searchquery (String) —
search terms
:maxrows (Integer) —
number of results to return [default 5]
:nolist (Array, String) —
space-separated list of usernames to exclude from the output
:requestinguser (String) —
the name of the user making the search, which changes the order of the matches

Returns:

(Array)

# File 'lib/scraperwiki-api.rb', line 327

def scraper_usersearch(opts = {})
  if Array === opts[:nolist]
    opts[:nolist] = opts[:nolist].join ' '
  end
  request '/scraper/usersearch', opts
end

Class: ScraperWiki::API

Overview

Defined Under Namespace

Constant Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(apikey = nil) ⇒ API

Class Method Details

.edit_scraper_url(shortname) ⇒ String

.scraper_url(shortname) ⇒ String

Instance Method Details

#datastore_sqlite(shortname, query, opts = {}) ⇒ Array, ...

#scraper_getinfo(shortname, opts = {}) ⇒ Array

#scraper_getruninfo(shortname, opts = {}) ⇒ Array

#scraper_getuserinfo(username) ⇒ Array

#scraper_search(opts = {}) ⇒ Array

#scraper_usersearch(opts = {}) ⇒ Array

#initialize(apikey = nil) ⇒ `API`

.edit_scraper_url(shortname) ⇒ `String`

.scraper_url(shortname) ⇒ `String`

#datastore_sqlite(shortname, query, opts = {}) ⇒ `Array`, ...

#scraper_getinfo(shortname, opts = {}) ⇒ `Array`

#scraper_getruninfo(shortname, opts = {}) ⇒ `Array`

#scraper_getuserinfo(username) ⇒ `Array`

#scraper_search(opts = {}) ⇒ `Array`

#scraper_usersearch(opts = {}) ⇒ `Array`