Class: Ailurus::Dataset

Inherits:
Object
  • Object
show all
Defined in:
lib/ailurus/dataset.rb,
lib/ailurus/dataset/create.rb,
lib/ailurus/dataset/search.rb,
lib/ailurus/dataset/update.rb,
lib/ailurus/dataset/metadata.rb

Overview

Public: A class corresponding to a PANDA Dataset.

client - An Ailurus::Client instance (see ‘/lib/ailurus/client.rb`). slug - The slug to a PANDA Dataset, as described at

http://panda.readthedocs.org/en/1.1.1/api.html#datasets

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(client, slug) ⇒ Dataset

Returns a new instance of Dataset.



15
16
17
18
# File 'lib/ailurus/dataset.rb', line 15

def initialize(client, slug)
  @client = client
  @slug = slug
end

Instance Attribute Details

#clientObject

Returns the value of attribute client.



13
14
15
# File 'lib/ailurus/dataset.rb', line 13

def client
  @client
end

#slugObject

Returns the value of attribute slug.



13
14
15
# File 'lib/ailurus/dataset.rb', line 13

def slug
  @slug
end

Instance Method Details

#create(columns = [], additional_params = {}) ⇒ Object

Public: Create this Dataset on the server.

This instance’s @slug will be used as its ‘name`, too, as that’s defined in the API.

columns - An Array of Hashes, one per column, about columns expected to

be in the Dataset. Each Hash SHOULD contain at least a :name,
but it also MAY contain a :type (e.g., "int", "unicode",
"bool"--default is "unicode") and/or :index (true or false,
depending on whether you want the column to be indexed--default
is false) (default: none).

additional_params - A Hash of other properties to set on the Dataset,

such as description and title (default: none).

Returns a metadata object, such as the one returned by Dataset#metadata.



19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# File 'lib/ailurus/dataset/create.rb', line 19

def create(columns = [], additional_params = {})
  # Start with the bare minimum.
  payload = {
    "name" => @slug,
    "slug" => @slug
  }

  # Add the columns. This requires the addition of up to three separate
  # parameters, each comma-delimited and in a consistent order.
  column_info = {}
  if not columns.empty?
    column_info["columns"] = columns.each_with_index.map do |column, index|
      column.fetch(:name, "column_#{index}")
    end.join(",")
    column_info["column_types"] = columns.map do |column|
      column.fetch(:type, "unicode")
    end.join(",")
    column_info["typed_columns"] = columns.map do |column|
      # FIXME: Probably should check whether non-false values _actually_
      # are true.
      column.fetch(:index, false).to_s
    end.join(",")
  end

  # Add other properties as specified.
  payload.merge!(additional_params)

  # Let's do this thing!
  @client.make_request(
    "/api/1.0/dataset/", :method => :post,
    :query => column_info, :body => payload)
end

#data_page(query = nil, page_num = 0, rows_per_page = 100, additional_params = {}) ⇒ Object

Internal: Retrieve a set of rows from the Dataset, specified by page number and page length.

query - A query string to use when searching the data. page_num - The 0-indexed page number of data to retrieve. rows_per_page - The number of rows to include on each page.

Returns an Array of Arrays.



47
48
49
50
51
52
53
54
# File 'lib/ailurus/dataset/search.rb', line 47

def data_page(
    query = nil, page_num = 0, rows_per_page = 100, additional_params = {})
  self.data_rows(
    query = query,
    offset = page_num * rows_per_page,
    limit = rows_per_page,
    additional_params = additional_params)
end

#data_rows(query = nil, offset = 0, limit = 100, additional_params = {}) ⇒ Object

Internal: Retrieve a set of rows from the Dataset, specified by offset and length.

query - A query string to use when searching the data. offset - The number of rows to exclude from the beginning of the results

before returning what follows; for example, to get the last
third of a 30-row set, you would need an offset of 20.

limit - The maximum number of rows to return, after honoring the

offset; for example, to get the last third of a 30-row set, you
would need a limit of 10.

Returns an Array of Arrays.



15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# File 'lib/ailurus/dataset/search.rb', line 15

def data_rows(query = nil, offset = 0, limit = 100, additional_params = {})
  endpoint = "/api/1.0/dataset/#{slug}/data/"
  params = {
    "offset" => offset,
    "limit" => limit
  }
  if query.nil?
    raise NotImplementedError, (
      "API returns unexpected results without a query present, so query is
      required for now.")
  else
    params["q"] = query
  end

  params.merge!(additional_params)

  res = @client.make_request(endpoint, :query => params)
  if res.objects.empty? && res.meta.next.nil?
    raise RangeError, "No data available for offset #{offset}"
  end

  res.objects.map { |row| row.data }
end

#get_indexed_name(field_name) ⇒ Object

Public: Get the indexed name for a field so you can perform more detailed searches if desired.

column_name - A String matching the name of a column in the Dataset.

Returns a String or nil, depending on whether the field is indexed.



25
26
27
28
29
30
31
# File 'lib/ailurus/dataset/metadata.rb', line 25

def get_indexed_name(field_name)
  column_schema = self..column_schema
  indexed_names_by_column_name = Hash[column_schema.map do |schema_entry|
    [schema_entry.name, schema_entry.indexed_name]
  end]
  indexed_names_by_column_name[field_name]
end

#metadataObject

Public: Retrieve metadata about this Dataset.

TODO: Figure out a good way to cache this so we don’t keep hitting it.

Returns a Hash.



10
11
12
13
14
15
16
17
# File 'lib/ailurus/dataset/metadata.rb', line 10

def 
  endpoint = "/api/1.0/dataset/#{@slug}/"
  begin
    @client.make_request(endpoint)
  rescue JSON::JSONError
    nil
  end
end

#search(query = nil, options = {}) ⇒ Object

Public: Search the Dataset with a given query.

Queries currently are required due to some observed problems with the PANDA API. See Dataset#data_rows.

query - A query string to use when searching the data.

Returns an Array of Arrays.



64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# File 'lib/ailurus/dataset/search.rb', line 64

def search(query = nil, options = {})
  # Handle optional arguments.
  max_results = options.fetch(:max_results, nil)
  additional_params = options.fetch(:options, {})

  rows = []
  page_num = 0

  while true  # Warning: Infinite loop! Remember to break.
    # Get the current page of results. If there aren't any results on that
    # page, we're done.
    begin
      rows.concat(self.data_page(
        query = query,
        page_num = page_num,
        rows_per_page = 100,
        additional_params = additional_params))
    rescue RangeError
      break  # Escape the infinite loop!
    end

    # If we have at least as many results as we're supposed to return,
    # we're done! (Truncating as necessary, of course.)
    if !max_results.nil? && rows.length >= max_results
      rows.slice!(max_results, rows.length - max_results)
      break  # Escape the infinite loop!
    end

    # Move on to the next page.
    page_num += 1
  end

  rows
end

#update(rows = []) ⇒ Object

Public: Update the data in this Dataset.

rows - An Array of data Hashes containing the following properties:

"objects"     - An Array of Strings in the same order as this
                Dataset's columns. If you don't know what order
                your columns are in, call Dataset#metadata and
                check the result's `column_schema` attribute.
"external_id" - An optional String identifying this row of data.
                Providing an external ID will allow future calls
                to Dataset#update to update this row with new
                information (assuming the same ID is used for one
                of its rows) rather than create a new row
                altogether. See http://bit.ly/1zeeax1 for more
                information.

Returns an OpenStruct describing the rows that were created and/or updated.



20
21
22
23
24
25
26
27
# File 'lib/ailurus/dataset/update.rb', line 20

def update(rows = [])
  @client.make_request(
    "/api/1.0/dataset/#{@slug}/data/",
    :method => :put,
    :body => {
      "objects" => rows
    })
end